Problem with zohmg data import into hbase

Problem with zohmg data import into hbase - hadoop

I have used zohmg and successfully created mapper, table in HBase and test-imported my data (using --local switch).
But I have problem inserting my data after mapping into HBase, this is error I get:
Exception in thread "main" java.lang.RuntimeException: class org.apache.hadoop.hbase.mapreduce.TableOutputFormat not org.apache.hadoop.mapred.OutputFormat
at org.apache.hadoop.conf.Configuration.setClass(Configuration.java:1034)
at org.apache.hadoop.mapred.JobConf.setOutputFormat(JobConf.java:471)
at org.apache.hadoop.streaming.StreamJob.setJobConf(StreamJob.java:818)
at org.apache.hadoop.streaming.StreamJob.run(StreamJob.java:122)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at org.apache.hadoop.streaming.HadoopStreaming.main(HadoopStreaming.java:50)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
Do you have any clues that might help me fix it? (Or maybe this i problem with Dumbo?)

Hadoop 0.20 introduced a new package, org.apache.hadoop.mapreduce, and deprecated the old one, org.apache.hadoop.mapred. HBase 0.20 followed suit with its map reduce support. It looks like this code is expecting an OutputFormat for the old api, but getting the HBase TableOutputFormat for the new api.
Looks like the latest commit over at github may help this, it says "added patch for reverting to the old api." (It also looks like there hasn't been any activity for awhile.)
http://github.com/zohmg/zohmg/commits/master

Related

Cloud Data Fusion - Existing Dataproc option missing

According to the documentation there is an option to use an existing Dataproc cluster in 6.2 version and above.
We use Cloud Data Fusion 6.2.0 but the existing Dataproc does not appear when we try to create a new compute profile.
What are we doing wrong? Why does the described option not show up? Do we have to do some additional configurations?
UPDATE 1
When I choose Dataproc, I see the followings:
UPDATE 2
When we try to use Remote Hadoop Provisioner we got the following error message in the /logs/program.log file. SSH connection is successful because the run-id folder is there.
2021-06-15 09:40:37,617 - ERROR [main:o.a.z.s.NIOServerCnxnFactory#44] - Thread Thread[main,5,main] died
java.lang.reflect.InvocationTargetException: null
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[na:1.8.0_282]
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[na:1.8.0_282]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.8.0_282]
at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_282]
at io.cdap.cdap.internal.app.runtime.distributed.remote.RemoteLauncher.main(RemoteLauncher.java:73) ~[launcher.jar:na]
Caused by: java.lang.NoSuchMethodError: com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V
at org.apache.hadoop.conf.Configuration.set(Configuration.java:1357) ~[hadoop-common-3.2.2.jar:na]
at org.apache.hadoop.conf.Configuration.set(Configuration.java:1338) ~[hadoop-common-3.2.2.jar:na]
at io.cdap.cdap.common.conf.CConfigurationUtil.copyTxProperties(CConfigurationUtil.java:100) ~[na:na]
at io.cdap.cdap.common.guice.ConfigModule.<init>(ConfigModule.java:62) ~[na:na]
at io.cdap.cdap.common.guice.ConfigModule.<init>(ConfigModule.java:49) ~[na:na]
at io.cdap.cdap.internal.app.runtime.distributed.remote.RemoteExecutionJobMain.initialize(RemoteExecutionJobMain.java:117) ~[na:na]
at io.cdap.cdap.internal.app.runtime.distributed.remote.RemoteExecutionJobMain.doMain(RemoteExecutionJobMain.java:98) ~[na:na]
at io.cdap.cdap.internal.app.runtime.distributed.remote.RemoteExecutionJobMain.main(RemoteExecutionJobMain.java:73) ~[na:na]
... 5 common frames omitted

For 6.2.0 , "Remote Hadoop Provisioner" is the right option to use for existing dataproc cluster. And the stucking issue you met with is caused by a rare case where API activation failed to assign the necessary role to the Dataproc-specific service account. This problem can be solved simply by granting the following service account the "Dataproc Service Agent" role in your project:
service-${project number}#dataproc-accounts.iam.gserviceaccount.com

I wasn't able to reproduce exactly the scenario since when creating a CDF instance from the scratch I was able to select Cloud Data Fusion 6.2.3 as similar closer version.
I can confirm that on version 6.2.3 you have the option to choose an Existing Dataproc Cluster. Therefore I would recommend to you to upgrade to at least that version. Follow this docs in order to do it in a safe way.
As alternative there is a method to configure Cloud Data Fusion pipeline to run against existing cluster here. This feature is available only on the Enterprise edition of Cloud Data Fusion.

tHiveCreateTable component gives "org.apache.hive.service.cli.HiveSQLException" exception

I have a Talend Big Data job where I am trying to connect to Hive and create a table. Hive connect works fine. But tHiveCreate table gives the below exception.
Exception in component tHiveCreateTable_1 (Test)
org.apache.hive.service.cli.HiveSQLException: Error while processing statement: Cannot modify mapred.job.name at runtime. It is not in list of params that are allowed to be modified at runtime
at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:258)
at org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:244)
at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:247)
at local_project.test_0_1.Test.tHiveCreateTable_1Process(Test.java:643)
at local_project.test_0_1.Test.tHiveConnection_1Process(Test.java:498)
at local_project.test_0_1.Test.runJobInTOS(Test.java:948)
at local_project.test_0_1.Test.main(Test.java:799)
Caused by: org.apache.hive.service.cli.HiveSQLException: Error while processing statement: Cannot modify mapred.job.name at runtime. It is not in list of params that are allowed to be modified at runtime
at org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:324)
at org.apache.hive.service.cli.operation.HiveCommandOperation.runInternal(HiveCommandOperation.java:108)
at org.apache.hive.service.cli.operation.Operation.run(Operation.java:264)
at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:479)
at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:466)
at org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:315)
at org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:509)
at org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1377)
at org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1362)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
at org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56)
at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Earlier, the tHiveConnection was failing with same error. As per one of the older posts, I unchecked the Hadoop propeties from the tHiveConnect component and it worked fine. The similar properties are not available in the tHiveCreateTable component as I am using the tHiveConnection to provide connection details to tHiveCreateTable component.
Any help will be appreciated. Thanks
Anil

Similar problem as Talend (7.0.1) - Cannot modify mapred.job.name at runtime. Try to fix property hive.security.authorization.sqlstd.confwhitelist

I was able to fix the issue.
Added a property to custom hive site in Ambari as:
hive.security.authorization.sqlstd.confwhitelist.append with values
mapred.job.name|mapred.child.env|query.invoker|hive.query.name
The issue got fixed.

How do I setup an InterSystems Cache Data Source in Jaspersoft Studio

I am trying to set up a new JDBC connection to an Intersystems Cache data source, and I'm struggling to know if it can even be done.
Since there was no Intersystems Cache option in the JDBC driver drop down, I added the driver string manually -> com.intersys.jdbc.CacheDriver
I then added the URL manually in the following format -> jdbc:Cache://123.123.123.123:12345/namespace
I also found the JDBC driver and have added it to the Jar File Path -> cachedb.jar
Based on the error message, I am wondering if it's even possible to connect to intersystems databases with the JDBC connector. What do you think?
When I try to connect, I get the following error:
Exception, if you want to see more information look into the details.
Reason: java.lang.ClassNotFoundException: com.intersys.jdbc.CacheDriver cannot be found by net.sf.jasperreports_6.2.1.final
The Details:
net.sf.jasperreports.engine.JRRuntimeException: java.lang.ClassNotFoundException: com.intersys.jdbc.CacheDriver cannot be found by net.sf.jasperreports_6.2.1.final
at net.sf.jasperreports.data.jdbc.JdbcDataAdapterService.getConnection(JdbcDataAdapterService.java:173)
at net.sf.jasperreports.data.jdbc.JdbcDataAdapterService.contributeParameters(JdbcDataAdapterService.java:128)
at net.sf.jasperreports.data.AbstractDataAdapterService.test(AbstractDataAdapterService.java:128)
at com.jaspersoft.studio.data.wizard.AbstractDataAdapterWizard$3.runOperations(AbstractDataAdapterWizard.java:162)
at com.jaspersoft.studio.utils.jobs.CheckedRunnableWithProgress$1.run(CheckedRunnableWithProgress.java:59)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException: com.intersys.jdbc.CacheDriver cannot be found by net.sf.jasperreports_6.2.1.final
at org.eclipse.osgi.internal.loader.BundleLoader.findClassInternal(BundleLoader.java:439)
at org.eclipse.osgi.internal.loader.BundleLoader.findClass(BundleLoader.java:352)
at org.eclipse.osgi.internal.loader.BundleLoader.findClass(BundleLoader.java:344)
at org.eclipse.osgi.internal.loader.ModuleClassLoader.loadClass(ModuleClassLoader.java:160)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at net.sf.jasperreports.engine.util.JRClassLoader.loadClassForRealName(JRClassLoader.java:174)
at net.sf.jasperreports.data.jdbc.JdbcDataAdapterService.getConnection(JdbcDataAdapterService.java:145)
... 5 more
I have asked this on the JasperReports community page, but it doesn't get much activity on there.

You say that you found cachedb.jar, but you should use cachejdbc.jar this file you can find at dev/java/lib/JDK(17|18) in InterSystems installation folder
Documentation

How to reproduce Hazelcast OperationTimeoutException: [CONCURRENT_MAP_CONTAINS_KEY]

Under load, using Hazlecast 2.4, we encountered the following Hazlecast exception in a cluster. As it seems the underlying issue has been addressed in Hazlecast 2.5. In order to validate that the upgrade indeed addresses the same issue we encountered we would like to reproduce it. In our current setup it only occurs rarely. How can we reproduce it under lab conditions?
I noticed
Hazelcast - OperationTimeoutException
which may be related.
com.hazelcast.core.OperationTimeoutException: [CONCURRENT_MAP_CONTAINS_KEY] Operation Timeout (with no response!): 0
at com.hazelcast.impl.BaseManager$ResponseQueueCall.waitAndGetResult(BaseManager.java:619)
at com.hazelcast.impl.BaseManager$ResponseQueueCall.getRedoAwareResult(BaseManager.java:641)
at com.hazelcast.impl.BaseManager$ResponseQueueCall.getResult(BaseManager.java:636)
at com.hazelcast.impl.BaseManager$RequestBasedCall.getResultAsBoolean(BaseManager.java:447)
at com.hazelcast.impl.BaseManager$ResponseQueueCall.getResultAsBoolean(BaseManager.java:555)
at com.hazelcast.impl.BaseManager$RequestBasedCall.booleanCall(BaseManager.java:432)
at com.hazelcast.impl.BaseManager$ResponseQueueCall.booleanCall(BaseManager.java:555)
at com.hazelcast.impl.ConcurrentMapManager$MContainsKey.containsKey(ConcurrentMapManager.java:622)
at com.hazelcast.impl.MProxyImpl$MProxyReal.containsKey(MProxyImpl.java:937)
at sun.reflect.GeneratedMethodAccessor322.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at com.hazelcast.impl.MProxyImpl$DynamicInvoker.invoke(MProxyImpl.java:66)
at com.sun.proxy.$Proxy180.containsKey(Unknown Source)
at com.hazelcast.impl.MProxyImpl.containsKey(MProxyImpl.java:312)

Protobuf error with custom filter

When I run my hbase custom filter I got this error:
org.apache.hadoop.hbase.client.RpcRetryingCaller#459c8c0a, java.io.IOException: java.io.IOException: java.lang.reflect.InvocationTargetException
at org.apache.hadoop.hbase.protobuf.ProtobufUtil.toFilter(ProtobufUtil.java:1360)
at org.apache.hadoop.hbase.protobuf.ProtobufUtil.toScan(ProtobufUtil.java:916)
at org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3056)
at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:28454)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2008)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:92)
at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:160)
at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:38)
at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:110)
at java.lang.Thread.run(Thread.java:744)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.hbase.protobuf.ProtobufUtil.toFilter(ProtobufUtil.java:1358)
... 9 more
Caused by: org.apache.hadoop.hbase.exceptions.DeserializationException: java.io.IOException: java.lang.reflect.InvocationTargetException
at org.apache.hadoop.hbase.filter.FilterList.parseFrom(FilterList.java:406)
... 14 more
Caused by: java.io.IOException: java.lang.reflect.InvocationTargetException
at org.apache.hadoop.hbase.protobuf.ProtobufUtil.toFilter(ProtobufUtil.java:1360)
at org.apache.hadoop.hbase.filter.FilterList.parseFrom(FilterList.java:403)
... 14 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.hbase.protobuf.ProtobufUtil.toFilter(ProtobufUtil.java:1358)
... 15 more
Caused by: org.apache.hadoop.hbase.exceptions.DeserializationException: parseFrom called on base Filter, but should be called on derived type
at org.apache.hadoop.hbase.filter.Filter.parseFrom(Filter.java:267)
... 20 more
Anybody know how can i fix it?

I also had this error when trying to make a custom filter. My problem was that I did not include the functions "toByteArray" and "parseFrom" in my filter. See here for where I found the solution, and links to examples. (took me two weeks of digging to find - HBase could really use some better documentation...)
As far as what needs to go into those methods, I'm still having trouble in that regard. Conceptually (as I understand it), their purpose is to encode and decode the identifying information for your instance of filter (basically, the information you would send to the constructor) into a serialized string of bytes. That way the particular filter can be 'instantiated' wherever its needed.
For me, including these methods prevented the hang and error, and my program now runs through to completion. I don't think I entirely understand the methods correctly, though, as it seems the filter still doesn't actually run, but that's another topic. (if you figured it out, let me know!)

I had 1 cluster server that was giving this same error. Note toByteArray and parseFrom where already present and the same jar file worked on other clusters just fine. I was able to solve it by restarting the HBase and Zookeeper services together along with ensuring that the /hbase/lib folder and custom filter jar file had the appropriate owner (set it to the hbase user) first.
I'm not able to replicate the error but what I did above solved it for me. I tried changing the owner, the HBase config for the /hbase/lib folder, creating a new folder but couldn't replicate it, so it could just come down to the HBase restart.
The missing link is now located here

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Problem with zohmg data import into hbase - hadoop

Related

Cloud Data Fusion - Existing Dataproc option missing

tHiveCreateTable component gives "org.apache.hive.service.cli.HiveSQLException" exception

How do I setup an InterSystems Cache Data Source in Jaspersoft Studio

How to reproduce Hazelcast OperationTimeoutException: [CONCURRENT_MAP_CONTAINS_KEY]

Protobuf error with custom filter

Categories

Resources