Failed to set ConnectorClientConfigOverridePolicy to All in Debezium mysql connector in docker - apache-kafka-connect

Trying to set ConnectorClientConfigOverridePolicy, by adding CONNECTOR_CLIENT_CONFIG_OVERRIDE_POLICY=All. During start up debezium connector fails with matches all=All. Seems, CONNECTOR_CLIENT_CONFIG_OVERRIDE_POLICY duplicates value, instead "All", value is "all=All"
Stopping due to error [org.apache.kafka.connect.cli.ConnectDistributed]
org.apache.kafka.connect.errors.ConnectException: Failed to find any class that implements interface org.apache.kafka.connect.connector.policy.ConnectorClientConfigOverridePolicy and which name matches all=All
Is it bug or I am doing something wrong?
Using debezium docker 1.5

this is partly bug and partly misconfiguration.
The env var should be named CONNECT_CONNECTOR_CLIENT_CONFIG_OVERRIDE_POLICY=All
But at the same time the start script that processes all env vars named CONNECT_ does not check for the underscore so CONNECTOR... matches too which breaks the further logic.

Related

debezium content based routing configuration

I'm using confluent so I've installed dibezium connectors according to confluent docs using confluent-hub in connect.properties I do have entry
plugin.path=/usr/share/java,/opt/confluent-6.0.0/share/confluent-hub-components
I need to use io.debezium.transforms.ContentBasedRouter https://debezium.io/documentation/reference/1.3/configuration/content-based-routing.html
so according to debezium doc I've downloaded debezium-scripting-1.3.1.Final.jar
and put it into
/opt/confluent-6.0.0/share/confluent-hub-components/ and copied it into
/opt/confluent-6.0.0/share/confluent-hub-components/debezium-debezium-connector-sqlserver/lib directories
here the entries in my mysql_src.json connector
"transforms": "unwrap,route",
"transforms.unwrap.type": "io.debezium.transforms.ExtractNewRecordState",
"transforms.unwrap.add.fields": "source.snapshot",
"transforms.route.type": "io.debezium.transforms.ContentBasedRouter",
"transforms.route.language": "jsr223.groovy",
"transforms.route.topic.expression": "value.__source_snapshot == 'false' ? 'test'"
when I'm trying to configure/load this connector I'm getting following error message
[2020-12-15 22:18:45,351] ERROR [Worker clientId=connect-1, groupId=connect-cluster] Failed to reconfigure connector's tasks, retrying after backoff: (org.apache.kafka.connect.runtime.distributed.DistributedHerder:1369)
java.lang.NoClassDefFoundError: io/debezium/DebeziumException
Any suggestions how to fix this problem ?
According the docs, you need to additionally obtain a JSR-223 script engine implementation and add its contents to the Debezium plug-in directories of your Kafka Connect environment, since:
Debezium does not come with any implementations of the JSR 223 API. To use an expression language with Debezium, you must download the JSR 223 script engine implementation for the language, and add to your Debezium connector plug-in directories, along any other JAR files used by the language implementation.
I am not sure that configuration is correct but I passed first configuration problem (I hope) I'm facing another problem now which I will describe in different question.
I am not sure what was wrong, I did following
Clean up zookeeper directories
Clean up kafka directories
Run kafka in distributed mode using command line start/stop scripts (not using confluent cli)
this solved java.lang.NoClassDefFoundError: io/debezium/DebeziumException
error

Connect to Teradata Using Airflow JDBC Connection

I'm trying to execute a SqlSensor task in Airflow using a connection to Teradata database. The connection is configured as follow:
I have provide in particular 2 driver paths separated by ", " but I am not sure if it's the proper way to do it?
/home/airflow/java_sample/tdgssconfig.jar
/home/airflow/java_sample/terajdbc4.jar
When the DAG executes, it triggers the error message
[2017-08-02 02:32:45,162] {models.py:1342} INFO - Executing <Task(SqlSensor): check_running_batch> on 2017-08-02 02:32:12
[2017-08-02 02:32:45,179] {base_hook.py:67} INFO - Using connection to: jdbc:teradata://myservername.mycompanyname.org/database=MYDBNAME,TMODE=ANSI,CHARSET=UTF8
[2017-08-02 02:32:45,313] {sensors.py:109} INFO - Poking: SELECT BATCH_KEY FROM MYDBNAME.AUDIT_BATCH WHERE BATCH_OWNER='ARO_TEST' AND AUDIT_STATUS_KEY=1;
[2017-08-02 02:32:45,316] {base_hook.py:67} INFO - Using connection to: jdbc:teradata://myservername.mycompanyname.org/database=MYDBNAME,TMODE=ANSI,CHARSET=UTF8
[2017-08-02 02:32:45,497] {models.py:1417} ERROR - java.lang.RuntimeException: Class com.teradata.jdbc.TeraDriver not found
What am I doing wrong?
The appropriate way to input multiple jars in the connections page is to separate both fully qualified paths with a comma which you did above.
I can confirm this is the approach I took and it worked (Airflow 10.1.1 and 10.1.2).
See: https://github.com/apache/airflow/blob/master/airflow/hooks/jdbc_hook.py#L51
Bonus: If you use Ad Hoc Query in Data Profiling to test it out, you'll notice that you'll get an error when you send a SELECT statement because Airflow wraps it in a LIMIT clause which TD doesn't support.
The solution provided by my team member was to merge the two jar into a single jar file. After doing it and indicating that new jar file in the driver path, it worked as expected.
Here is the link to the JAR file: https://github.com/alexisrolland/linux-setup/blob/master/teradataDriverJdbc.jar
Here is a code snippet example to use the connection in a SQLSensor Task:
CheckRunningBatch = SqlSensor(
task_id='check_running_batch',
conn_id='ed_data_quality_edw_dev',
sql="SELECT CASE WHEN MAX(BATCH_KEY) IS NOT NULL THEN 0 ELSE 1 END FROM DATABASE.AUDIT_BATCH WHERE STATUS_KEY=1;",
poke_interval=300,
dag=dag)

Hive Browser Throwing Error

I am trying to put some basic query in hive editor in hue browser , but it is returning the following error whereas my Hivecli works fine and able to execute queries. Could someone help me?
Fetching results ran into the following error(s):
Bad status for request TFetchResultsReq(fetchType=1,
operationHandle=TOperationHandle(hasResultSet=True,
modifiedRowCount=None, operationType=0,
operationId=THandleIdentifier(secret='r\t\x80\xac\x1a\xa0K\xf8\xa4\xa0\x85?\x03!\x88\xa9',
guid='\x852\x0c\x87b\x7fJ\xe2\x9f\xee\x00\xc9\xeeo\x06\xbc')),
orientation=4, maxRows=-1):
TFetchResultsResp(status=TStatus(errorCode=0, errorMessage="Couldn't
find log associated with operation handle: OperationHandle
[opType=EXECUTE_STATEMENT,
getHandleIdentifier()=85320c87-627f-4ae2-9fee-00c9ee6f06bc]",
sqlState=None,
infoMessages=["*org.apache.hive.service.cli.HiveSQLException:Couldn't
find log associated with operation handle: OperationHandle
[opType=EXECUTE_STATEMENT,
getHandleIdentifier()=85320c87-627f-4ae2-9fee-00c9ee6f06bc]:24:23",
'org.apache.hive.service.cli.operation.OperationManager:getOperationLogRowSet:OperationManager.java:229',
'org.apache.hive.service.cli.session.HiveSessionImpl:fetchResults:HiveSessionImpl.java:687',
'sun.reflect.GeneratedMethodAccessor14:invoke::-1',
'sun.reflect.DelegatingMethodAccessorImpl:invoke:DelegatingMethodAccessorImpl.java:43',
'java.lang.reflect.Method:invoke:Method.java:606',
'org.apache.hive.service.cli.session.HiveSessionProxy:invoke:HiveSessionProxy.java:78',
'org.apache.hive.service.cli.session.HiveSessionProxy:access$000:HiveSessionProxy.java:36',
'org.apache.hive.service.cli.session.HiveSessionProxy$1:run:HiveSessionProxy.java:63',
'java.security.AccessController:doPrivileged:AccessController.java:-2',
'javax.security.auth.Subject:doAs:Subject.java:415',
'org.apache.hadoop.security.UserGroupInformation:doAs:UserGroupInformation.java:1657',
'org.apache.hive.service.cli.session.HiveSessionProxy:invoke:HiveSessionProxy.java:59',
'com.sun.proxy.$Proxy19:fetchResults::-1',
'org.apache.hive.service.cli.CLIService:fetchResults:CLIService.java:454',
'org.apache.hive.service.cli.thrift.ThriftCLIService:FetchResults:ThriftCLIService.java:672',
'org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults:getResult:TCLIService.java:1553',
'org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults:getResult:TCLIService.java:1538',
'org.apache.thrift.ProcessFunction:process:ProcessFunction.java:39',
'org.apache.thrift.TBaseProcessor:process:TBaseProcessor.java:39',
'org.apache.hive.service.auth.TSetIpAddressProcessor:process:TSetIpAddressProcessor.java:56',
'org.apache.thrift.server.TThreadPoolServer$WorkerProcess:run:TThreadPoolServer.java:285',
'java.util.concurrent.ThreadPoolExecutor:runWorker:ThreadPoolExecutor.java:1145',
'java.util.concurrent.ThreadPoolExecutor$Worker:run:ThreadPoolExecutor.java:615',
'java.lang.Thread:run:Thread.java:745'], statusCode=3), results=None,
hasMoreRows=None)
This error could be either due to HiveServer2 not running or Hue does not have access to hive_conf_dir.
Check whether the HiveServer2 has been started and is running. It uses the port 10000 by default.
netstat -ntpl | grep 10000
If it is not running, start the HiveServer2
$HIVE_HOME/bin/hiveserver2
Also check the Hue configuration file hue.ini. The hive_conf_dir property must be set under [beeswax] section. If not set, add this property under [beeswax]
hive_conf_dir=$HIVE_HOME/conf
Restart supervisor after making these changes.

libnetwork: Error: unknown command "/var/run/docker/netns/582bd184e561" for "some_app"

I am trying to setup a network in the container (using Docker's libnetwork and libcontainer), but I keep running into this issue. As far as I can tell it's looking into some_app to get some sandbox information?
INFO[3808] No non-localhost DNS nameservers are left in resolv.conf. Using default external servers : [nameserver 8.8.8.8 nameserver 8.8.4.4]
INFO[3808] IPv6 enabled; Adding default IPv6 external servers : [nameserver 2001:4860:4860::8888 nameserver 2001:4860:4860::8844]
Error: unknown command "/var/run/docker/netns/582bd184e561" for "some_app"
Run 'some_app --help' for usage.
ERRO[3808] Resolver Setup/Start failed for container 6b81802576bd4f16aa117061f81b5c3e, "setup not done yet"
ERRO[3808] failed to add interface vethef0a693 to sandbox: failed in prefunc: failed to set namespace on link "vethef0a693": invalid argument
ERRO[3808] failed to add interface vethef0a693 to sandbox: failed in prefunc: failed to set namespace on link "vethef0a693": invalid argument
I was wondering if anyone could help me make sense of this and perhaps prevent it. Are these two separate errors?
Thank you
Here is the library I am trying to use
It took me a while to figure this out, but here goes:
Just like in Docker, libnetwork creates a veth interface pair. It then moves one end of the veth pair into the container namespace. During this process libnetwork tries to execute commands registered at runtime on the current instance of the binary (some_app in this case).
These commands do not exist on the external interface of some_app however. They are injected later using a library called reexec. For this to work, reexec needs to be initialized like this:
if reexec.Init() {
return
}
Also note that according to this thread libnetwork is currently not supported for applications outside of Docker.
NB: I discovered this by reading the source code, so I might be wrong but my issue went away after this.

Setting elasticsearch properties in spark-submit

I'm trying to launch Spark jobs that use Elastic Search input via command line using spark-submit as described in http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/current/spark.html
I'm setting the properties in a file, but when launching spark-submit it gives the following warnings:
~/spark-1.0.1-bin-hadoop1/bin/spark-submit --class Main --properties-file spark.conf SparkES.jar
Warning: Ignoring non-spark config property: es.resource=myresource
Warning: Ignoring non-spark config property: es.nodes=mynode
Warning: Ignoring non-spark config property: es.query=myquery
...
Exception in thread "main" org.elasticsearch.hadoop.rest.EsHadoopNoNodesLeftException: Connection error (check network and/or proxy settings)- all nodes failed
My config file looks like (with correct values):
es.nodes nodeip:port
es.resource index/type
es.query query
Setting the properties in the Configuration object in the code works, but I need to avoid this workaround.
Is there a way to set those properties via command line?
I don't know if you resolved your issue (if so, how?), but I found this solution:
import org.elasticsearch.spark.rdd.EsSpark
EsSpark.saveToEs(rdd, "spark/docs", Map("es.nodes" -> "10.0.5.151"))
Bye
When you pass a config file to spark-submit, it only loads configs that start with 'spark.'
So, in my config I simply use
spark.es.nodes <es-ip>
and in the code itself I have to do
val conf = new SparkConf()
conf.set("es.nodes", conf.get("spark.es.nodes"))

Resources