I'm trying to instantiate JanusGraph with the following configuration, using Cassandra as storage backend and ElasticSearch as indexing backend:
JanusGraph graph = JanusGraphFactory.build()
.set("storage.backend", "cassandra")
.set("storage.hostname", "localhost")
.set("cache.db-cache", true)
.set("schema.default", "none")
.set("index.search.backend", "elasticsearch")
.set("index.search.elasticsearch.client-only", "false")
.set("index.search.elasticsearch.local-mode", "true")
.open();
The above code works if cassandra's cluser is named Test Cluster. If I rename it to something else, an exception is thrown:
java.lang.IllegalArgumentException: Could not instantiate implementation: org.janusgraph.diskstorage.es.ElasticSearchIndex
at org.janusgraph.util.system.ConfigurationUtil.instantiate(ConfigurationUtil.java:69)
at org.janusgraph.diskstorage.Backend.getImplementationClass(Backend.java:477)
at org.janusgraph.diskstorage.Backend.getIndexes(Backend.java:464)
at org.janusgraph.diskstorage.Backend.<init>(Backend.java:149)
at org.janusgraph.graphdb.configuration.GraphDatabaseConfiguration.getBackend(GraphDatabaseConfiguration.java:1850)
at org.janusgraph.graphdb.database.StandardJanusGraph.<init>(StandardJanusGraph.java:134)
at org.janusgraph.core.JanusGraphFactory.open(JanusGraphFactory.java:107)
at org.janusgraph.core.JanusGraphFactory.open(JanusGraphFactory.java:97)
at org.janusgraph.core.JanusGraphFactory$Builder.open(JanusGraphFactory.java:152)
at engineering.divine.core.GraphFactory.cassandraGraph(GraphFactory.java:403)
at engineering.divine.core.GraphFactory.graph(GraphFactory.java:298)
at engineering.divine.core.GraphFactory.getDefault(GraphFactory.java:99)
at engineering.divine.repository.Repository.listRepositoriesToUpdate(Repository.java:130)
at engineering.divine.daemon.RepositoryAnalysisDaemon.run(RepositoryAnalysisDaemon.java:24)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
at org.janusgraph.util.system.ConfigurationUtil.instantiate(ConfigurationUtil.java:58)
... 20 more
Caused by: org.elasticsearch.client.transport.NoNodeAvailableException: None of the configured nodes are available: []
at org.elasticsearch.client.transport.TransportClientNodesService.ensureNodesAreAvailable(TransportClientNodesService.java:279)
at org.elasticsearch.client.transport.TransportClientNodesService.execute(TransportClientNodesService.java:198)
at org.elasticsearch.client.transport.support.InternalTransportClusterAdminClient.execute(InternalTransportClusterAdminClient.java:86)
at org.elasticsearch.client.support.AbstractClusterAdminClient.health(AbstractClusterAdminClient.java:127)
at org.elasticsearch.action.admin.cluster.health.ClusterHealthRequestBuilder.doExecute(ClusterHealthRequestBuilder.java:92)
at org.elasticsearch.action.ActionRequestBuilder.execute(ActionRequestBuilder.java:91)
at org.elasticsearch.action.ActionRequestBuilder.execute(ActionRequestBuilder.java:65)
at org.janusgraph.diskstorage.es.ElasticSearchIndex.<init>(ElasticSearchIndex.java:215)
... 25 more
How can I make elasticsearch work with my new cluster name?
Using Max OS X 10.11.6, any pointers are highly appreciated.
Reset your data, if it is for testing purpose
Clear all your data from the storage backend (Cassandra)
Restart all the janusgraph nodes
In JanusGraph
Each configuration option has a certain mutability level that governs whether and how it can be modified after the database is opened for the first time. The following listing describes the mutability levels.
FIXED
Once the database has been opened, these configuration options cannot be changed for the entire life of the database
GLOBAL_OFFLINE
These options can only be changed for the entire database cluster at once when all instances are shut down
GLOBAL
These options can only be changed globally across the entire database cluster
MASKABLE
These options are global but can be overwritten by a local configuration file
LOCAL
These options can only be provided through a local configuration file
You can get the Mutability Level of any configuration from the below link
Source : http://docs.janusgraph.org/latest/config-ref.html
Related
I have an application running on 4 nodes inside 2 clusters. The application is having cache configured using infinispan and SpringEmbeddedCacheManager. I am getting an intermittent issue while I am trying to add data to cache, please note that I am adding data as key value pair where my value will be always custom class created.
I just tried to change the cache type to replicated, local and invalidated, I have observed that I am not having issue when using local or invalidated cache. Can anyone confirm if large object in distributed cache cause any issue.
Infinispan Config
<distributed-cache name="apigw-access-cache" owners="1" segments="20" mode="SYNC" statistics="false">
<eviction max-entries="10" strategy="LIRS"/>
<expiration max-idle="360000" lifespan="3600000"/>
</distributed-cache>
Infinispan Version
<dependency>
<groupId>org.infinispan</groupId>
<artifactId>infinispan-spring4</artifactId>
<version>7.0.3.Final</version><!--$NO-MVN-MAN-VER$ -->
</dependency>
<dependency>
<groupId>org.infinispan</groupId>
<artifactId>infinispan-cli-server</artifactId>
<version>7.0.0.CR1</version>
</dependency>
Errors
2019-12-04 09:44:23.361 [qtp1933072581-15447] ERROR o.i.i.InvocationContextInterceptor - ISPN000136: Execution error
org.infinispan.remoting.RemoteException: ISPN000217: Received exception from node-10097-32028, see cause for remote stack trace
at org.infinispan.remoting.transport.AbstractTransport.checkResponse(AbstractTransport.java:44) ~[infinispan-core-7.0.3.Final.jar!/:7.0.3.Final]
at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.processSingleCall(CommandAwareRpcDispatcher.java:381) ~[infinispan-core-7.0.3.Final.jar!/:7.0.3.Final]
at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.invokeRemoteCommand(CommandAwareRpcDispatcher.java:167) ~[infinispan-core-7.0.3.Final.jar!/:7.0.3.Final]
at org.infinispan.remoting.transport.jgroups.JGroupsTransport.invokeRemotely(JGroupsTransport.java:560) ~[infinispan-core-7.0.3.Final.jar!/:7.0.3.Final]
at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:290) ~[infinispan-core-7.0.3.Final.jar!/:7.0.3.Final]
Caused by: java.lang.IllegalArgumentException: Can not set java.util.Set field Class.field to java.lang.String
at sun.reflect.UnsafeFieldAccessorImpl.throwSetIllegalArgumentException(UnsafeFieldAccessorImpl.java:167) ~[na:1.8.0_121]
at sun.reflect.UnsafeFieldAccessorImpl.throwSetIllegalArgumentException(UnsafeFieldAccessorImpl.java:171) ~[na:1.8.0_121]
at sun.reflect.UnsafeObjectFieldAccessorImpl.set(UnsafeObjectFieldAccessorImpl.java:81) ~[na:1.8.0_121]
at java.lang.reflect.Field.set(Field.java:764) ~[na:1.8.0_121]
Caused by: org.infinispan.commons.CacheException: Problems invoking command.
at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.handle(CommandAwareRpcDispatcher.java:221)
at org.jgroups.blocks.RequestCorrelator.handleRequest(RequestCorrelator.java:460)
at org.jgroups.blocks.RequestCorrelator.receiveMessage(RequestCorrelator.java:377)
Caused by: org.infinispan.commons.CacheException: Problems invoking command.
at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.handle(CommandAwareRpcDispatcher.java:221) ~[infinispan-core-7.0.3.Final.jar!/:7.0.3.Final]
at org.jgroups.blocks.RequestCorrelator.handleRequest(RequestCorrelator.java:460) ~[jgroups-3.6.1.Final.jar!/:3.6.1.Final]
at org.jgroups.blocks.RequestCorrelator.receiveMessage(RequestCorrelator.java:377) ~[jgroups-3.6.1.Final.jar!/:3.6.1.Final]
First off, you should not be using such an old version of Infinispan, you should upgrade to 9.4.17.Final
The stack trace fragments don't appear to be in the right order, but Can not set java.util.Set field Class.field to java.lang.String is because you two nodes have different versions of the same class.
The biggest difference between distributed and invalidation caches is that distributed caches replicate values to other nodes, while invalidation caches send an invalidation message that includes only the key. If an invalidation cache works, then the problem is almost certainly that one of your value classes has changed and one of the nodes still has the old version.
Elastic search server doesn't start on a new node. It fails with the following error :
[2019-06-27T00:16:01,471][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [node-10] fatal error in thread [main], exiting
java.lang.ExceptionInInitializerError: null
at java.lang.Class.forName0(Native Method) ~[?:1.8.0_212]
at java.lang.Class.forName(Class.java:348) ~[?:1.8.0_212]
at org.elasticsearch.painless.Definition.addStruct(Definition.java:753) ~[?:?]
at org.elasticsearch.painless.Definition.<init>(Definition.java:566) ~[?:?]
at org.elasticsearch.painless.PainlessScriptEngine.<init>(PainlessScriptEngine.java:106) ~[?:?]
at org.elasticsearch.painless.PainlessPlugin.getScriptEngine(PainlessPlugin.java:59) ~[?:?]
at org.elasticsearch.script.ScriptModule.<init>(ScriptModule.java:69) ~[elasticsearch-6.2.2.jar:6.2.2]
at org.elasticsearch.node.Node.<init>(Node.java:327) ~[elasticsearch-6.2.2.jar:6.2.2]
at org.elasticsearch.node.Node.<init>(Node.java:246) ~[elasticsearch-6.2.2.jar:6.2.2]
at org.elasticsearch.bootstrap.Bootstrap$5.<init>(Bootstrap.java:213) ~[elasticsearch-6.2.2.jar:6.2.2]
at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:213) ~[elasticsearch-6.2.2.jar:6.2.2]
at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:323) ~[elasticsearch-6.2.2.jar:6.2.2]
at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:121) ~[elasticsearch-6.2.2.jar:6.2.2]
at org.elasticsearch.bootstrap.Elasticsearch.execute(Elasticsearch.java:112) ~[elasticsearch-6.2.2.jar:6.2.2]
at org.elasticsearch.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:86) ~[elasticsearch-6.2.2.jar:6.2.2]
at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:124) ~[elasticsearch-cli-6.2.2.jar:6.2.2]
at org.elasticsearch.cli.Command.main(Command.java:90) ~[elasticsearch-cli-6.2.2.jar:6.2.2]
at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:92) ~[elasticsearch-6.2.2.jar:6.2.2]
at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:85) ~[elasticsearch-6.2.2.jar:6.2.2]
Caused by: java.lang.ArrayIndexOutOfBoundsException: 4
at java.time.chrono.JapaneseEra.<clinit>(JapaneseEra.java:179) ~[?:1.8.0_212]
... 19 more
I have a 5-node cluster running in production already in GCP. Since the load has increased, I tried to add few more nodes to that cluster. To create new nodes, I used "Create similar" option provided by GCP. I updated all configurations like node name and deleted the /var/lib/elasticsearch/nodes folder and tried to start the ES server on the new node. But it always fails with the error mentioned above.
This node uses OpenJDK 1.8. I enabled trace log for root logger, but couldn't identify what's wrong with the new node.
Please help in identifying what's the root cause of this problem.
Theory explanation:
public class ArrayIndexOutOfBoundsException extends IndexOutOfBoundsException
Thrown to indicate that an array has been accessed with an illegal index. The index is either negative or greater than or equal to the size of the array.
How to avoid:
Always remember that array is zero-based index, the first element is at 0th index and the last element is at length - 1 index.
So accessing the first element will give you the
java.lang.ArrayIndexOutOfBoundsException : 0 error in Java.
In your case it is a Caused by: java.lang.ArrayIndexOutOfBoundsException: 4
You should always pay to one-off errors while looping over an array in Java. The programmer often makes mistakes which result in either missing first or last element of the array by messing 1st element or finishing just before the last element by incorrectly using the <, >,> >= or <= operator in for loops.
Give special attention to the start and end condition of the loop.
Put some if-else blocks in code.
Here I have reproduced your error:
I am using confluent hdfs sink connector 5.0.0 with kafka 2.0.0 and I need to use ExtractTopic transformation (https://docs.confluent.io/current/connect/transforms/extracttopic.html). My connector works fine but when I add this transformation I get NullPointerException, even on simple data sample with only 2 attributes.
ERROR Task hive-table-test-0 threw an uncaught and unrecoverable exception (org.apache.kafka.connect.runtime.WorkerSinkTask:482)
java.lang.NullPointerException
at io.confluent.connect.hdfs.DataWriter.write(DataWriter.java:352)
at io.confluent.connect.hdfs.HdfsSinkTask.put(HdfsSinkTask.java:109)
at org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:464)
at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:265)
at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:182)
at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:150)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:146)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:190)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Here is configuration of connector:
name=hive-table-test
connector.class=io.confluent.connect.hdfs.HdfsSinkConnector
tasks.max=1
topics=hive_table_test
key.converter=io.confluent.connect.avro.AvroConverter
value.converter=io.confluent.connect.avro.AvroConverter
key.converter.schema.registry.url=${env.SCHEMA_REGISTRY_URL}
value.converter.schema.registry.url=${env.SCHEMA_REGISTRY_URL}
schema.compatibility=BACKWARD
# HDFS configuration
# Use store.url instead of hdfs.url (deprecated) in later versions. Property store.url does not work, yet
hdfs.url=${env.HDFS_URL}
hadoop.conf.dir=/etc/hadoop/conf
hadoop.home=/opt/cloudera/parcels/CDH/lib/hadoop
topics.dir=${env.HDFS_TOPICS_DIR}
# Connector configuration
format.class=io.confluent.connect.hdfs.avro.AvroFormat
flush.size=100
rotate.interval.ms=60000
# Hive integration
hive.integration=true
hive.metastore.uris=${env.HIVE_METASTORE_URIS}
hive.conf.dir=/etc/hive/conf
hive.home=/opt/cloudera/parcels/CDH/lib/hive
hive.database=kafka_connect
# Transformations
transforms=InsertMetadata, ExtractTopic
transforms.InsertMetadata.type=org.apache.kafka.connect.transforms.InsertField$Value
transforms.InsertMetadata.partition.field=partition
transforms.InsertMetadata.offset.field=offset
transforms.ExtractTopic.type=io.confluent.connect.transforms.ExtractTopic$Value
transforms.ExtractTopic.field=name
transforms.ExtractTopic.skip.missing.or.null=true
I am using schema registry, data is in avro format and I am sure the given attribute name is not null. Any suggestions? What I need is basically to extract content of given field and use it as a topic name.
EDIT:
It happens even on simple json like this in avro format:
{
"attr": "tmp",
"name": "topic1"
}
Short answer is because, you change the name of the topic in your Transformation.
Hdfs Connector for each topic partition has separate TopicPartitionWriter. When SinkTask, that is responsible for processing messages is created in open(...) method for each partition TopicPartitionWriter is created.
When it processed SinkRecords, based on topic name and partition number it looks up for TopicPartitionWriter and try to append record to its buffer. In your case it couldn't find any write for message. The topic name was changed by Transformation and for that pair (topic, partition) any TopicPartitionWriter was not created.
SinkRecords, that are passed to HdfsSinkTask::put(Collection<SinkRecord> records), have partitions and topic already set, so you don't have to apply any Transformations.
I think io.confluent.connect.transforms.ExtractTopic should be rather used for SourceConnector.
I'm working on a custom load function to load data from Bigtable using Pig on Dataproc. I compile my java code using the following list of jar files I grabbed from Dataproc. When I run the following Pig script, it fails when it tries to establish a connection with Bigtable.
Error message is:
Bigtable does not support managed connections.
Questions:
Is there a work around for this problem?
Is this a known issue and is there a plan to fix or adjust?
Is there a different way of implementing multi scans as a load function for Pig that will work with Bigtable?
Details:
Jar files:
hadoop-common-2.7.3.jar
hbase-client-1.2.2.jar
hbase-common-1.2.2.jar
hbase-protocol-1.2.2.jar
hbase-server-1.2.2.jar
pig-0.16.0-core-h2.jar
Here's a simple Pig script using my custom load function:
%default gte '2017-03-23T18:00Z'
%default lt '2017-03-23T18:05Z'
%default SHARD_FIRST '00'
%default SHARD_LAST '25'
%default GTE_SHARD '$gte\_$SHARD_FIRST'
%default LT_SHARD '$lt\_$SHARD_LAST'
raw = LOAD 'hbase://events_sessions'
USING com.eduboom.pig.load.HBaseMultiScanLoader('$GTE_SHARD', '$LT_SHARD', 'event:*')
AS (es_key:chararray, event_array);
DUMP raw;
My custom load function HBaseMultiScanLoader creates a list of Scan objects to perform multiple scans on different ranges of data in the table events_sessions determined by the time range between gte and lt and sharded by SHARD_FIRST through SHARD_LAST.
HBaseMultiScanLoader extends org.apache.pig.LoadFunc so it can be used in the Pig script as load function.
When Pig runs my script, it calls LoadFunc.getInputFormat().
My implementation of getInputFormat() returns an instance of my custom class MultiScanTableInputFormat which extends org.apache.hadoop.mapreduce.InputFormat.
MultiScanTableInputFormat initializes org.apache.hadoop.hbase.client.HTable object to initialize the connection to the table.
Digging into the hbase-client source code, I see that org.apache.hadoop.hbase.client.ConnectionManager.getConnectionInternal() calls org.apache.hadoop.hbase.client.ConnectionManager.createConnection() with the attribute “managed” hardcoded to “true”.
You can see from the stack track below that my code (MultiScanTableInputFormat) tries to initialize an HTable object which invokes getConnectionInternal() which does not provide an option to set managed to false.
Going down the stack trace, you will get to AbstractBigtableConnection that will not accept managed=true and therefore cause the connection to Bigtable to fail.
Here’s the stack trace showing the error:
2017-03-24 23:06:44,890 [JobControl] ERROR com.turner.hbase.mapreduce.MultiScanTableInputFormat - java.io.IOException: java.lang.reflect.InvocationTargetException
at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:240)
at org.apache.hadoop.hbase.client.ConnectionManager.createConnection(ConnectionManager.java:431)
at org.apache.hadoop.hbase.client.ConnectionManager.createConnection(ConnectionManager.java:424)
at org.apache.hadoop.hbase.client.ConnectionManager.getConnectionInternal(ConnectionManager.java:302)
at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:185)
at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:151)
at com.eduboom.hbase.mapreduce.MultiScanTableInputFormat.setConf(Unknown Source)
at com.eduboom.pig.load.HBaseMultiScanLoader.getInputFormat(Unknown Source)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:264)
at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:301)
at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:318)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:196)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
at org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:335)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.pig.backend.hadoop23.PigJobControl.submit(PigJobControl.java:128)
at org.apache.pig.backend.hadoop23.PigJobControl.run(PigJobControl.java:194)
at java.lang.Thread.run(Thread.java:745)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:276)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:238)
... 26 more
Caused by: java.lang.IllegalArgumentException: Bigtable does not support managed connections.
at org.apache.hadoop.hbase.client.AbstractBigtableConnection.<init>(AbstractBigtableConnection.java:123)
at com.google.cloud.bigtable.hbase1_2.BigtableConnection.<init>(BigtableConnection.java:55)
... 31 more
The original problem was caused by the use of outdated and deprecated hbase client jars and classes.
I updated my code to use the newest hbase client jars provided by Google and the original problem was fixed.
I still get stuck with some ZK issue that I still did not figure out, but that's a conversation for a different question.
This one is answered!
I have confronted the same error message:
Bigtable does not support managed connections.
However, according to my research, the root cause is that the class HTable can not be constructed explicitly. After changed the way to construct HTable by connection.getTable. The problem resolved.
I just started cascading programming and have a cascading job which needs to run variable times of iteration. During each iteration, it ready from file (Tap) generated from previous iteration and write calculated data to two separate SinkTaps.
One Tap (Tap Final) is used to collect data from each iterations.
The other Tap (Tap intermediate) using to collect data that need to be calculated in the next iteration.
I am using SinkMode.UPDATE for "Tap final" to make this happen. It works correct at local mode. But failed at cluster mode. Complain about file already existed ("Tap final").
I am running CDH4.4 and cascading 2.5.2. Seems like there is no one has experienced the same problem.
If anyone knows any possible way to fix it, please let me know. Thanks
Caused by: org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://dv-db.machines:8020/tmp/xxxx/cluster/97916 already exists
at org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:126)
at org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:419)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:332)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1269)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1266)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1266)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:606)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:601)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:601)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:586)
at cascading.flow.hadoop.planner.HadoopFlowStepJob.internalNonBlockingStart(HadoopFlowStepJob.java:105)
at cascading.flow.planner.FlowStepJob.blockOnJob(FlowStepJob.java:196)
at cascading.flow.planner.FlowStepJob.start(FlowStepJob.java:149)
at cascading.flow.planner.FlowStepJob.call(FlowStepJob.java:124)
at cascading.flow.planner.FlowStepJob.call(FlowStepJob.java:43)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
It would helpful to understand the issue better if you could add cascading flow code to your question.
It seems the job file with same name is being used between different jobs on cluster mode. One simple solution in case you are fine to not run it concurrently would be be set max concurrent steps to 1.
Flow flow = flowConnector.connect("name", sources, sinks, outPipe1, outPipe2);
flow.setMaxConcurrentSteps(jobProperties, 1);
UPDATE only works with sinks (like databases) that support in-place updating.
If you're using Hfs (a file system sink) then you'll need to use SinkMode.REPLACE.