Cloud Data Fusion - Existing Dataproc option missing - hadoop

According to the documentation there is an option to use an existing Dataproc cluster in 6.2 version and above.
We use Cloud Data Fusion 6.2.0 but the existing Dataproc does not appear when we try to create a new compute profile.
What are we doing wrong? Why does the described option not show up? Do we have to do some additional configurations?
UPDATE 1
When I choose Dataproc, I see the followings:
UPDATE 2
When we try to use Remote Hadoop Provisioner we got the following error message in the /logs/program.log file. SSH connection is successful because the run-id folder is there.
2021-06-15 09:40:37,617 - ERROR [main:o.a.z.s.NIOServerCnxnFactory#44] - Thread Thread[main,5,main] died
java.lang.reflect.InvocationTargetException: null
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[na:1.8.0_282]
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[na:1.8.0_282]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.8.0_282]
at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_282]
at io.cdap.cdap.internal.app.runtime.distributed.remote.RemoteLauncher.main(RemoteLauncher.java:73) ~[launcher.jar:na]
Caused by: java.lang.NoSuchMethodError: com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V
at org.apache.hadoop.conf.Configuration.set(Configuration.java:1357) ~[hadoop-common-3.2.2.jar:na]
at org.apache.hadoop.conf.Configuration.set(Configuration.java:1338) ~[hadoop-common-3.2.2.jar:na]
at io.cdap.cdap.common.conf.CConfigurationUtil.copyTxProperties(CConfigurationUtil.java:100) ~[na:na]
at io.cdap.cdap.common.guice.ConfigModule.<init>(ConfigModule.java:62) ~[na:na]
at io.cdap.cdap.common.guice.ConfigModule.<init>(ConfigModule.java:49) ~[na:na]
at io.cdap.cdap.internal.app.runtime.distributed.remote.RemoteExecutionJobMain.initialize(RemoteExecutionJobMain.java:117) ~[na:na]
at io.cdap.cdap.internal.app.runtime.distributed.remote.RemoteExecutionJobMain.doMain(RemoteExecutionJobMain.java:98) ~[na:na]
at io.cdap.cdap.internal.app.runtime.distributed.remote.RemoteExecutionJobMain.main(RemoteExecutionJobMain.java:73) ~[na:na]
... 5 common frames omitted

For 6.2.0 , "Remote Hadoop Provisioner" is the right option to use for existing dataproc cluster. And the stucking issue you met with is caused by a rare case where API activation failed to assign the necessary role to the Dataproc-specific service account. This problem can be solved simply by granting the following service account the "Dataproc Service Agent" role in your project:
service-${project number}#dataproc-accounts.iam.gserviceaccount.com

I wasn't able to reproduce exactly the scenario since when creating a CDF instance from the scratch I was able to select Cloud Data Fusion 6.2.3 as similar closer version.
I can confirm that on version 6.2.3 you have the option to choose an Existing Dataproc Cluster. Therefore I would recommend to you to upgrade to at least that version. Follow this docs in order to do it in a safe way.
As alternative there is a method to configure Cloud Data Fusion pipeline to run against existing cluster here. This feature is available only on the Enterprise edition of Cloud Data Fusion.

Related

Azure web app (Linux) started throwing font error for apache poi xlsx export

I have Java application which uses apache poi to generate xlsx export.
The app is deployed on Azure app service as web app on Linux setup and it was working fine since many months (No Font config was ever installed on Azure web service). but suddenly it started throwing error on worksheet creation method saying Font not found.
below is the stack trace
Caused by: java.lang.InternalError: java.lang.reflect.InvocationTargetException
2022-03-04T11:49:12.048900166Z at java.desktop/sun.font.FontManagerFactory$1.run(FontManagerFactory.java:86)
2022-03-04T11:49:12.048903666Z at java.base/java.security.AccessController.doPrivileged(Native Method)
2022-03-04T11:49:12.048907266Z at java.desktop/sun.font.FontManagerFactory.getInstance(FontManagerFactory.java:74)
2022-03-04T11:49:12.048910866Z at java.desktop/java.awt.Font.getFont2D(Font.java:497)
2022-03-04T11:49:12.048914366Z at java.desktop/java.awt.Font.canDisplayUpTo(Font.java:2250)
2022-03-04T11:49:12.048917866Z at java.desktop/java.awt.font.TextLayout.singleFont(TextLayout.java:469)
2022-03-04T11:49:12.048924066Z at java.desktop/java.awt.font.TextLayout.<init>(TextLayout.java:530)
2022-03-04T11:49:12.048927967Z at org.apache.poi.ss.util.SheetUtil.getDefaultCharWidth(SheetUtil.java:273)
2022-03-04T11:49:12.048931467Z at org.apache.poi.xssf.streaming.AutoSizeColumnTracker.<init>(AutoSizeColumnTracker.java:117)
2022-03-04T11:49:12.048935167Z at org.apache.poi.xssf.streaming.SXSSFSheet.<init>(SXSSFSheet.java:82)
2022-03-04T11:49:12.048938867Z at org.apache.poi.xssf.streaming.SXSSFWorkbook.createAndRegisterSXSSFSheet(SXSSFWorkbook.java:674)
2022-03-04T11:49:12.048942367Z at org.apache.poi.xssf.streaming.SXSSFWorkbook.createSheet(SXSSFWorkbook.java:695)
2022-03-04T11:49:12.048946167Z at org.xxx.xxx.utils.EXCELReportExporter.writeHeaderLine(EXCELBenchReportExporter.java:28)
and deep down in stacktrace
2022-03-04T11:49:12.049101970Z Caused by: java.lang.reflect.InvocationTargetException: null
2022-03-04T11:49:12.049105470Z at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
2022-03-04T11:49:12.049109070Z at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
2022-03-04T11:49:12.049112670Z at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
2022-03-04T11:49:12.049116270Z at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:490)
2022-03-04T11:49:12.049119870Z at java.desktop/sun.font.FontManagerFactory$1.run(FontManagerFactory.java:84)
2022-03-04T11:49:12.049123370Z ... 156 common frames omitted
2022-03-04T11:49:12.049126870Z Caused by: java.lang.NullPointerException: null
2022-03-04T11:49:12.049130370Z at java.desktop/sun.awt.FontConfiguration.getVersion(FontConfiguration.java:1262)
2022-03-04T11:49:12.049133970Z at java.desktop/sun.awt.FontConfiguration.readFontConfigFile(FontConfiguration.java:225)
2022-03-04T11:49:12.049137571Z at java.desktop/sun.awt.FontConfiguration.init(FontConfiguration.java:107)
2022-03-04T11:49:12.049141071Z at java.desktop/sun.awt.X11FontManager.createFontConfiguration(X11FontManager.java:719)
2022-03-04T11:49:12.049144671Z at java.desktop/sun.font.SunFontManager$2.run(SunFontManager.java:379)
2022-03-04T11:49:12.049148171Z at java.base/java.security.AccessController.doPrivileged(Native Method)
2022-03-04T11:49:12.049151671Z at java.desktop/sun.font.SunFontManager.<init>(SunFontManager.java:324)
2022-03-04T11:49:12.049155371Z at java.desktop/sun.awt.FcFontManager.<init>(FcFontManager.java:35)
2022-03-04T11:49:12.049158971Z at java.desktop/sun.awt.X11FontManager.<init>(X11FontManager.java:56)
2022-03-04T11:49:12.049162671Z ... 161 common frames omitted
Apparently there is no change in the application and environment is made.
Same app is tested on Windows & Linux with JDK 8 and 11 and it is still working fine locally.
Any help is much appreciated.
Apologies you're experiencing this issue.
Kindly check if Java version on Web App is set to auto-update.
This may cause the Java minor version to change to the most recent one and may throw exceptions on some API calls.
As outlined in this doc- It is recommend to use a fixed version for production environments and not auto-update.
To update the Java version, please try these steps:
1.Navigate to your WebApp on Azure Portal
2.Select the Configuration tab under Settings blade, and then General Settings.
3.If you have "Java SE (Embedded Web Server) (auto-update)” selected under “Java web server” option, then change the Java minor version to 11.0.11. Now, check to see if it helps.
I am having the same issue: Apache POI FontConfiguration NPE on Azure. The only workaround I found for the moment, is to set the minor java version to "11.0.11". I created a GitHub ticket also: https://github.com/Azure/azure-cli/issues/21540. The problem I have is that I didn't find a solution to force the minor version using an "az webapp create" command.
If you are using Apache Tomcat 8.5 have a look to the automatic update in the configuration tab on Azure Portal.
In the newest version (> 8.5.66) the Docker image of the service contains also an update of Java (not the version but the "provider" - they switched to OpenJDK) and there are no fonts installed causing the problem.
I fix it rollbacking to the 8.5.66 version
I am currently having the same issue. A temporary fix I did is to remove all the styling and formatting. I think this is an issue on the azure app service container itself. I've raised this question to the Azure community and will update here once I get an answer.
Another workaround would be to deploy the app in a custom container.
For Apache Tomcat 9 works if you choose tomcat version 9.0.46

tHiveCreateTable component gives "org.apache.hive.service.cli.HiveSQLException" exception

I have a Talend Big Data job where I am trying to connect to Hive and create a table. Hive connect works fine. But tHiveCreate table gives the below exception.
Exception in component tHiveCreateTable_1 (Test)
org.apache.hive.service.cli.HiveSQLException: Error while processing statement: Cannot modify mapred.job.name at runtime. It is not in list of params that are allowed to be modified at runtime
at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:258)
at org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:244)
at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:247)
at local_project.test_0_1.Test.tHiveCreateTable_1Process(Test.java:643)
at local_project.test_0_1.Test.tHiveConnection_1Process(Test.java:498)
at local_project.test_0_1.Test.runJobInTOS(Test.java:948)
at local_project.test_0_1.Test.main(Test.java:799)
Caused by: org.apache.hive.service.cli.HiveSQLException: Error while processing statement: Cannot modify mapred.job.name at runtime. It is not in list of params that are allowed to be modified at runtime
at org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:324)
at org.apache.hive.service.cli.operation.HiveCommandOperation.runInternal(HiveCommandOperation.java:108)
at org.apache.hive.service.cli.operation.Operation.run(Operation.java:264)
at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:479)
at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:466)
at org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:315)
at org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:509)
at org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1377)
at org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1362)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
at org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56)
at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Earlier, the tHiveConnection was failing with same error. As per one of the older posts, I unchecked the Hadoop propeties from the tHiveConnect component and it worked fine. The similar properties are not available in the tHiveCreateTable component as I am using the tHiveConnection to provide connection details to tHiveCreateTable component.
Any help will be appreciated. Thanks
Anil
Similar problem as Talend (7.0.1) - Cannot modify mapred.job.name at runtime. Try to fix property hive.security.authorization.sqlstd.confwhitelist
I was able to fix the issue.
Added a property to custom hive site in Ambari as:
hive.security.authorization.sqlstd.confwhitelist.append with values
mapred.job.name|mapred.child.env|query.invoker|hive.query.name
The issue got fixed.

How to reproduce Hazelcast OperationTimeoutException: [CONCURRENT_MAP_CONTAINS_KEY]

Under load, using Hazlecast 2.4, we encountered the following Hazlecast exception in a cluster. As it seems the underlying issue has been addressed in Hazlecast 2.5. In order to validate that the upgrade indeed addresses the same issue we encountered we would like to reproduce it. In our current setup it only occurs rarely. How can we reproduce it under lab conditions?
I noticed
Hazelcast - OperationTimeoutException
which may be related.
com.hazelcast.core.OperationTimeoutException: [CONCURRENT_MAP_CONTAINS_KEY] Operation Timeout (with no response!): 0
at com.hazelcast.impl.BaseManager$ResponseQueueCall.waitAndGetResult(BaseManager.java:619)
at com.hazelcast.impl.BaseManager$ResponseQueueCall.getRedoAwareResult(BaseManager.java:641)
at com.hazelcast.impl.BaseManager$ResponseQueueCall.getResult(BaseManager.java:636)
at com.hazelcast.impl.BaseManager$RequestBasedCall.getResultAsBoolean(BaseManager.java:447)
at com.hazelcast.impl.BaseManager$ResponseQueueCall.getResultAsBoolean(BaseManager.java:555)
at com.hazelcast.impl.BaseManager$RequestBasedCall.booleanCall(BaseManager.java:432)
at com.hazelcast.impl.BaseManager$ResponseQueueCall.booleanCall(BaseManager.java:555)
at com.hazelcast.impl.ConcurrentMapManager$MContainsKey.containsKey(ConcurrentMapManager.java:622)
at com.hazelcast.impl.MProxyImpl$MProxyReal.containsKey(MProxyImpl.java:937)
at sun.reflect.GeneratedMethodAccessor322.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at com.hazelcast.impl.MProxyImpl$DynamicInvoker.invoke(MProxyImpl.java:66)
at com.sun.proxy.$Proxy180.containsKey(Unknown Source)
at com.hazelcast.impl.MProxyImpl.containsKey(MProxyImpl.java:312)

Protobuf error with custom filter

When I run my hbase custom filter I got this error:
org.apache.hadoop.hbase.client.RpcRetryingCaller#459c8c0a, java.io.IOException: java.io.IOException: java.lang.reflect.InvocationTargetException
at org.apache.hadoop.hbase.protobuf.ProtobufUtil.toFilter(ProtobufUtil.java:1360)
at org.apache.hadoop.hbase.protobuf.ProtobufUtil.toScan(ProtobufUtil.java:916)
at org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3056)
at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:28454)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2008)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:92)
at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:160)
at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:38)
at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:110)
at java.lang.Thread.run(Thread.java:744)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.hbase.protobuf.ProtobufUtil.toFilter(ProtobufUtil.java:1358)
... 9 more
Caused by: org.apache.hadoop.hbase.exceptions.DeserializationException: java.io.IOException: java.lang.reflect.InvocationTargetException
at org.apache.hadoop.hbase.filter.FilterList.parseFrom(FilterList.java:406)
... 14 more
Caused by: java.io.IOException: java.lang.reflect.InvocationTargetException
at org.apache.hadoop.hbase.protobuf.ProtobufUtil.toFilter(ProtobufUtil.java:1360)
at org.apache.hadoop.hbase.filter.FilterList.parseFrom(FilterList.java:403)
... 14 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.hbase.protobuf.ProtobufUtil.toFilter(ProtobufUtil.java:1358)
... 15 more
Caused by: org.apache.hadoop.hbase.exceptions.DeserializationException: parseFrom called on base Filter, but should be called on derived type
at org.apache.hadoop.hbase.filter.Filter.parseFrom(Filter.java:267)
... 20 more
Anybody know how can i fix it?
I also had this error when trying to make a custom filter. My problem was that I did not include the functions "toByteArray" and "parseFrom" in my filter. See here for where I found the solution, and links to examples. (took me two weeks of digging to find - HBase could really use some better documentation...)
As far as what needs to go into those methods, I'm still having trouble in that regard. Conceptually (as I understand it), their purpose is to encode and decode the identifying information for your instance of filter (basically, the information you would send to the constructor) into a serialized string of bytes. That way the particular filter can be 'instantiated' wherever its needed.
For me, including these methods prevented the hang and error, and my program now runs through to completion. I don't think I entirely understand the methods correctly, though, as it seems the filter still doesn't actually run, but that's another topic. (if you figured it out, let me know!)
I had 1 cluster server that was giving this same error. Note toByteArray and parseFrom where already present and the same jar file worked on other clusters just fine. I was able to solve it by restarting the HBase and Zookeeper services together along with ensuring that the /hbase/lib folder and custom filter jar file had the appropriate owner (set it to the hbase user) first.
I'm not able to replicate the error but what I did above solved it for me. I tried changing the owner, the HBase config for the /hbase/lib folder, creating a new folder but couldn't replicate it, so it could just come down to the HBase restart.
The missing link is now located here

Problem with zohmg data import into hbase

I have used zohmg and successfully created mapper, table in HBase and test-imported my data (using --local switch).
But I have problem inserting my data after mapping into HBase, this is error I get:
Exception in thread "main" java.lang.RuntimeException: class org.apache.hadoop.hbase.mapreduce.TableOutputFormat not org.apache.hadoop.mapred.OutputFormat
at org.apache.hadoop.conf.Configuration.setClass(Configuration.java:1034)
at org.apache.hadoop.mapred.JobConf.setOutputFormat(JobConf.java:471)
at org.apache.hadoop.streaming.StreamJob.setJobConf(StreamJob.java:818)
at org.apache.hadoop.streaming.StreamJob.run(StreamJob.java:122)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at org.apache.hadoop.streaming.HadoopStreaming.main(HadoopStreaming.java:50)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
Do you have any clues that might help me fix it? (Or maybe this i problem with Dumbo?)
Hadoop 0.20 introduced a new package, org.apache.hadoop.mapreduce, and deprecated the old one, org.apache.hadoop.mapred. HBase 0.20 followed suit with its map reduce support. It looks like this code is expecting an OutputFormat for the old api, but getting the HBase TableOutputFormat for the new api.
Looks like the latest commit over at github may help this, it says "added patch for reverting to the old api." (It also looks like there hasn't been any activity for awhile.)
http://github.com/zohmg/zohmg/commits/master

Resources