kafka import/export data in local system - hadoop

When i try to import/export data by using kafka. I ran a query
bin/connect-standalone.sh config/connect-standalone.properties config/connect-file-source.properties config/connect-file-sink.properties.
But i getting error
ERROR Failed to create job for config/connect-file-sink.properties (org.apache.kafka.connect.cli.ConnectStandalone:89)
[2017-05-10 21:26:49,110] ERROR Stopping after connector error (org.apache.kafka.connect.cli.ConnectStandalone:100)
java.util.concurrent.ExecutionException: org.apache.kafka.connect.errors.AlreadyExistsException: Connector local-file already exists
at org.apache.kafka.connect.util.ConvertingFutureCallback.result(ConvertingFutureCallback.java:80)
at org.apache.kafka.connect.util.ConvertingFutureCallback.get(ConvertingFutureCallback.java:67)
at org.apache.kafka.connect.cli.ConnectStandalone.main(ConnectStandalone.java:97)
Caused by: org.apache.kafka.connect.errors.AlreadyExistsException: Connector local-file already exists
at org.apache.kafka.connect.runtime.standalone.StandaloneHerder.putConnectorConfig(StandaloneHerder.java:145)
at org.apache.kafka.connect.cli.ConnectStandalone.main(ConnectStandalone.java:94).
Please help me out to solve this?

It looks like you have the same connector name for your two connectors you are trying to run in this worker
Connector local-file already exists
Try changing the name property in one of you connector configuration files.

Related

debezium content based routing configuration

I'm using confluent so I've installed dibezium connectors according to confluent docs using confluent-hub in connect.properties I do have entry
plugin.path=/usr/share/java,/opt/confluent-6.0.0/share/confluent-hub-components
I need to use io.debezium.transforms.ContentBasedRouter https://debezium.io/documentation/reference/1.3/configuration/content-based-routing.html
so according to debezium doc I've downloaded debezium-scripting-1.3.1.Final.jar
and put it into
/opt/confluent-6.0.0/share/confluent-hub-components/ and copied it into
/opt/confluent-6.0.0/share/confluent-hub-components/debezium-debezium-connector-sqlserver/lib directories
here the entries in my mysql_src.json connector
"transforms": "unwrap,route",
"transforms.unwrap.type": "io.debezium.transforms.ExtractNewRecordState",
"transforms.unwrap.add.fields": "source.snapshot",
"transforms.route.type": "io.debezium.transforms.ContentBasedRouter",
"transforms.route.language": "jsr223.groovy",
"transforms.route.topic.expression": "value.__source_snapshot == 'false' ? 'test'"
when I'm trying to configure/load this connector I'm getting following error message
[2020-12-15 22:18:45,351] ERROR [Worker clientId=connect-1, groupId=connect-cluster] Failed to reconfigure connector's tasks, retrying after backoff: (org.apache.kafka.connect.runtime.distributed.DistributedHerder:1369)
java.lang.NoClassDefFoundError: io/debezium/DebeziumException
Any suggestions how to fix this problem ?
According the docs, you need to additionally obtain a JSR-223 script engine implementation and add its contents to the Debezium plug-in directories of your Kafka Connect environment, since:
Debezium does not come with any implementations of the JSR 223 API. To use an expression language with Debezium, you must download the JSR 223 script engine implementation for the language, and add to your Debezium connector plug-in directories, along any other JAR files used by the language implementation.
I am not sure that configuration is correct but I passed first configuration problem (I hope) I'm facing another problem now which I will describe in different question.
I am not sure what was wrong, I did following
Clean up zookeeper directories
Clean up kafka directories
Run kafka in distributed mode using command line start/stop scripts (not using confluent cli)
this solved java.lang.NoClassDefFoundError: io/debezium/DebeziumException
error

Cannot create Hive external table using jdbcStorageHandler

I am running a small cluster in Amazone EMR in order to play with Apache Hive 2.3.5. It is my understanding that Apache Hive can import data from a remote database and have the cluster to run queries. I was following an example that is provided in Apache Hive web documentation (https://cwiki.apache.org/confluence/display/Hive/JdbcStorageHandler) and created the following code:
CREATE EXTERNAL TABLE hive_table
(
col1 int,
col2 string,
col3 date
)
STORED BY 'org.apache.hive.storage.jdbc.JdbcStorageHandler'
TBLPROPERTIES (
'hive.sql.database.type'='POSTGRES',
'hive.sql.jdbc.driver'='org.postgresql.Driver',
'hive.sql.jdbc.url'='jdbc:postgresql://<url>/<dbname>',
'hive.sql.dbcp.username'='<username>',
'hive.sql.dbcp.password'='<password>',
'hive.sql.table'='<dbtable>',
'hive.sql.dbcp.maxActive'='1'
);
But I get the following error:
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.RuntimeException: MetaException(message:org.apache.hadoop.hive.serde2.SerDeException java.lang.IllegalArgumentException: Property hive.sql.query is required.)
According to the documentation, I need to specify either “hive.sql.table” or “hive.sql.query” to tell how to get data from jdbc database. But if I replace hive.sql.table with hive.sql.query I get the following error:
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.RuntimeException: MetaException(message:org.apache.hadoop.hive.serde2.SerDeException java.lang.IllegalArgumentException: No enum constant org.apache.hive.storage.jdbc.conf.DatabaseType.POSTGRES)
I tried looking in the web for a solution and it doesn't look like anyone experience the same issues that I am having. Do I need to modify a config file or am I missing something critical in my code?
I think you are using a version of the jar which doesn't support POSTGRES.
Download the latest jar from this link:
http://repo1.maven.org/maven2/org/apache/hive/hive-jdbc-handler/3.1.2/hive-jdbc-handler-3.1.2.jar
Put this downloaded jar into a hdfs location.
Run hive normally.
Run command: add jar ${HDFS_PATH_TO_DOWNLOADED_JAR}
Run your create table command

Unable to create state store in Kafka streams

I am getting this error Failed to lock the state directory: /tmp/kafka-streams/string-monitor/0_1 while creating state store in my kafka streams application. Here is the complete stack trace of the application
[2016-08-30 12:43:09,408] ERROR [StreamThread-1] User provided listener org.apache.kafka.streams.processor.internals.StreamThread$1 for group string-monitor failed on partition assignment (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
org.apache.kafka.streams.errors.ProcessorStateException: Error while creating the state manager
at org.apache.kafka.streams.processor.internals.AbstractTask.<init>(AbstractTask.java:71)
at org.apache.kafka.streams.processor.internals.StreamTask.<init>(StreamTask.java:86)
at org.apache.kafka.streams.processor.internals.StreamThread.createStreamTask(StreamThread.java:550)
at org.apache.kafka.streams.processor.internals.StreamThread.addStreamTasks(StreamThread.java:577)
at org.apache.kafka.streams.processor.internals.StreamThread.access$000(StreamThread.java:68)
at org.apache.kafka.streams.processor.internals.StreamThread$1.onPartitionsAssigned(StreamThread.java:123)
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinComplete(ConsumerCoordinator.java:222)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$1.onSuccess(AbstractCoordinator.java:232)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$1.onSuccess(AbstractCoordinator.java:227)
at org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:133)
at org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:107)
at org.apache.kafka.clients.consumer.internals.RequestFuture$2.onSuccess(RequestFuture.java:182)
at org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:133)
at org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:107)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$SyncGroupResponseHandler.handle(AbstractCoordinator.java:436)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$SyncGroupResponseHandler.handle(AbstractCoordinator.java:422)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:679)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:658)
at org.apache.kafka.clients.consumer.internals.RequestFuture$1.onSuccess(RequestFuture.java:167)
at org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:133)
at org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:107)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.onComplete(ConsumerNetworkClient.java:426)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:278)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:360)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:224)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:192)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:163)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:243)
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.ensurePartitionAssignment(ConsumerCoordinator.java:345)
at org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:977)
at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:937)
at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:295)
at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:218)
Caused by: java.io.IOException: Failed to lock the state directory: /tmp/kafka-streams/string-monitor/0_1
at org.apache.kafka.streams.processor.internals.ProcessorStateManager.<init>(ProcessorStateManager.java:95)
at org.apache.kafka.streams.processor.internals.AbstractTask.<init>(AbstractTask.java:69)
... 32 more
Exception in thread "StreamThread-1" org.apache.kafka.streams.errors.StreamsException: Failed to rebalance
at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:299)
at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:218)
Caused by: org.apache.kafka.streams.errors.ProcessorStateException: Error while creating the state manager
at org.apache.kafka.streams.processor.internals.AbstractTask.<init>(AbstractTask.java:71)
at org.apache.kafka.streams.processor.internals.StreamTask.<init>(StreamTask.java:86)
at org.apache.kafka.streams.processor.internals.StreamThread.createStreamTask(StreamThread.java:550)
at org.apache.kafka.streams.processor.internals.StreamThread.addStreamTasks(StreamThread.java:577)
at org.apache.kafka.streams.processor.internals.StreamThread.access$000(StreamThread.java:68)
at org.apache.kafka.streams.processor.internals.StreamThread$1.onPartitionsAssigned(StreamThread.java:123)
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinComplete(ConsumerCoordinator.java:222)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$1.onSuccess(AbstractCoordinator.java:232)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$1.onSuccess(AbstractCoordinator.java:227)
at org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:133)
at org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:107)
at org.apache.kafka.clients.consumer.internals.RequestFuture$2.onSuccess(RequestFuture.java:182)
at org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:133)
at org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:107)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$SyncGroupResponseHandler.handle(AbstractCoordinator.java:436)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$SyncGroupResponseHandler.handle(AbstractCoordinator.java:422)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:679)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:658)
at org.apache.kafka.clients.consumer.internals.RequestFuture$1.onSuccess(RequestFuture.java:167)
at org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:133)
at org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:107)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.onComplete(ConsumerNetworkClient.java:426)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:278)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:360)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:224)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:192)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:163)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:243)
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.ensurePartitionAssignment(ConsumerCoordinator.java:345)
at org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:977)
at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:937)
at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:295)
... 1 more
Caused by: java.io.IOException: Failed to lock the state directory: /tmp/kafka-streams/string-monitor/0_1
at org.apache.kafka.streams.processor.internals.ProcessorStateManager.<init>(ProcessorStateManager.java:95)
at org.apache.kafka.streams.processor.internals.AbstractTask.<init>(AbstractTask.java:69)
... 32 more
And I create a state store as below
StateStoreSupplier avgStore = Stores.create("avgStore")
.withKeys(Serdes.String())
.withValues(Serdes.String())
.persistent()
.build();
Any idea of how to fix this?
Did you configure multiple threads within your application instance? If yes, it may be due to a known issue in older versions of Kafka, where the underlying consumer used by Kafka Streams (in the app instance) can take too long to rebalance, causing itself to be kicked out of the consumer group (and hence triggering another consumer group rebalance) while it is still under the first rebalance process.
The following error messages in your stacktrace indicate that you are actually running into the problem I described above:
Exception in thread "StreamThread-1" org.apache.kafka.streams.errors.StreamsException: Failed to rebalance
at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:299)
at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:218)
Caused by: org.apache.kafka.streams.errors.ProcessorStateException: Error while creating the state manager
The problem is summarized in this Apache Kafka ticket:
https://issues.apache.org/jira/browse/KAFKA-3758
A recent change to the underlying Kafka consumer client fixes this issue:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-62%3A+Allow+consumer+to+send+heartbeats+from+a+background+thread
However it is not included in an official Kafka release yet and, as of today, is available only in Apache Kafka trunk. If you are able to run your app with Kafka trunk you can verify if this problem has gone away already.
I also saw this problem when the user didn't have permissions to write to the default state.dir
When I changed the following property to a dir w/good permissions everything was fine:
property.put(StreamsConfig.STATE_DIR_CONFIG, "{goodDir}")
This was observed in 0.10.2
Just a note for those who receives the same Failed to lock the state directory exception and uses The Spring for Apache Kafka (spring-kafka) (example from spring doc):
It took me a while to figure out in the source code that Spring starts your built Steams automatically, so there is no need for you to do manually something like:
KafkaStreams streams = new KafkaStreams(builder, config);
streams.start();
I dit it and faced the aforementioned exception because, not surpassingly, there were two Application instances running: one is created by Spring, another by me

Oracle Data Integrator SQL to HDFS IKM returns error

I am using ODI (12.1.3.0.0). I created topology for Oracle DB which is OK and I created topology for HDFS using File technology where I think the problem is in.
DataServer for HDFS, I left JDBC driver empty, and filled JDBC Url with hdfs://remotehostname:port
Physical Schema for HDFS, I filled both Schema and Work Schema with /my/path
Then created Logical Schema and Model. After that created Datastore under the model with these definitions.
Name: TestName
Resource Name: TESTFILE.txt
File Format: Fixed
After all these, created a project and a mapping under the project.
Finally when I run the mapping I see these errors:
ODI-1217: Session Oracle2HDFSMapping_Physical_SESS (15) fails with return code ODI-1298.
ODI-1226: Step Physical_STEP fails after 1 attempt(s).
ODI-1240: Flow Physical_STEP fails while performing a Add execute to Sqoop script-IKM SQL to HDFS File (Sqoop)- operation. This flow loads target table null.
ODI-1298: Serial task "SERIAL-MAP_MAIN- (10)" failed because child task "SERIAL-EU-GGUSER_UNIT (20)" is in error.
ODI-1298: Serial task "SERIAL-EU-GGUSER_UNIT (20)" failed because child task "Add execute to Sqoop script-IKM SQL to HDFS File (Sqoop)- (40)" is in error.
Caused By: java.io.IOException: Cannot run program "chmod": CreateProcess error=2, The system cannot find the file specified
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1047)
at java.lang.Runtime.exec(Runtime.java:617)
at java.lang.Runtime.exec(Runtime.java:450)
at java.lang.Runtime.exec(Runtime.java:347)
at oracle.odi.runtime.agent.execution.cmd.OSCommandExecutor.execute(OSCommandExecutor.java:54)
at oracle.odi.runtime.agent.execution.cmd.OSCommandExecutor.execute(OSCommandExecutor.java:29)
at oracle.odi.runtime.agent.execution.TaskExecutionHandler.handleTask(TaskExecutionHandler.java:52)
at oracle.odi.runtime.agent.execution.SessionTask.processTask(SessionTask.java:203)
at oracle.odi.runtime.agent.execution.SessionTask.doExecuteTask(SessionTask.java:114)
at oracle.odi.runtime.agent.execution.AbstractSessionTask.execute(AbstractSessionTask.java:886)
at oracle.odi.runtime.agent.execution.SessionExecutor$SerialTrain.runTasks(SessionExecutor.java:2198)
at oracle.odi.runtime.agent.execution.SessionExecutor.executeSession(SessionExecutor.java:591)
at oracle.odi.runtime.agent.processor.TaskExecutorAgentRequestProcessor$1.doAction(TaskExecutorAgentRequestProcessor.java:718)
at oracle.odi.runtime.agent.processor.TaskExecutorAgentRequestProcessor$1.doAction(TaskExecutorAgentRequestProcessor.java:611)
at oracle.odi.core.persistence.dwgobject.DwgObjectTemplate.execute(DwgObjectTemplate.java:203)
at oracle.odi.runtime.agent.processor.TaskExecutorAgentRequestProcessor.doProcessStartAgentTask(TaskExecutorAgentRequestProcessor.java:800)
at oracle.odi.runtime.agent.processor.impl.StartSessRequestProcessor.access$1400(StartSessRequestProcessor.java:74)
at oracle.odi.runtime.agent.processor.impl.StartSessRequestProcessor$StartSessTask.doExecute(StartSessRequestProcessor.java:702)
at oracle.odi.runtime.agent.processor.task.AgentTask.execute(AgentTask.java:180)
at oracle.odi.runtime.agent.support.DefaultAgentTaskExecutor$2.run(DefaultAgentTaskExecutor.java:108)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: CreateProcess error=2, The system cannot find the file specified
at java.lang.ProcessImpl.create(Native Method)
at java.lang.ProcessImpl.<init>(ProcessImpl.java:385)
at java.lang.ProcessImpl.start(ProcessImpl.java:136)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1028)
... 20 more
I wonder where I did it wrong?
For a file Datastore, you need to define the attributes (columns) by opening the Datastore and going on the attribute tab. If the file already exists, you can reverse-engineer the attributes and rename them and change the datatype if needed.
The error message you received for the second task mentions that the file (generated in the fist task) does not exist. So there might be a problem with the first task, probably due to the missing attributes in your datastore.
Here is a detailed article about SQL To HDFS file (Sqoop) KM written by the ODI A-Team : http://www.ateam-oracle.com/importing-data-from-sql-databases-into-hadoop-with-sqoop-and-oracle-data-integrator-odi/

HPCC/HDFS Connector

Does anyone know about HPCC/HDFS connector.we are using both HPCC and HADOOP.There is one utility(HPCC/HDFS connector) developed by HPCC which allows HPCC cluster to acess HDFS data
i have installed the connector but when i run the program to acess data from hdfs it gives error as libhdfs.so.0 doesn't exist.
I tried to build libhdfs.so using command
ant compile-libhdfs -Dlibhdfs=1
its giving me error as
target "compile-libhdfs" does not exist in the project "hadoop"
i used one more command
ant compile-c++-libhdfs -Dlibhdfs=1
its giving error as
ivy-download:
[get] Getting: http://repo2.maven.org/maven2/org/apache/ivy/ivy/2.1.0/ivy-2.1.0.jar
[get] To: /home/hadoop/hadoop-0.20.203.0/ivy/ivy-2.1.0.jar
[get] Error getting http://repo2.maven.org/maven2/org/apache/ivy/ivy/2.1.0/ivy-2.1.0.jar
to /home/hadoop/hadoop-0.20.203.0/ivy/ivy-2.1.0.jar
BUILD FAILED java.net.ConnectException: Connection timed out
any suggestion will be a great help
Chhaya, you might not need to build libhdfs.so, depending on how you installed hadoop, you might already have it.
Check in HADOOP_LOCATION/c++/Linux-<arch>/lib/libhdfs.so, where HADOOP_LOCATION is your hadoop install location, and arch is the machine’s architecture (i386-32 or amd64-64).
Once you locate the lib, make sure the H2H connector is configured correctly (see page 4 here).
It's just a matter of updating the HADOOP_LOCATION var in the config file:
/opt/HPCCSystems/hdfsconnector.conf
good luck.

Resources