tHiveCreateTable component gives "org.apache.hive.service.cli.HiveSQLException" exception - hadoop

I have a Talend Big Data job where I am trying to connect to Hive and create a table. Hive connect works fine. But tHiveCreate table gives the below exception.
Exception in component tHiveCreateTable_1 (Test)
org.apache.hive.service.cli.HiveSQLException: Error while processing statement: Cannot modify mapred.job.name at runtime. It is not in list of params that are allowed to be modified at runtime
at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:258)
at org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:244)
at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:247)
at local_project.test_0_1.Test.tHiveCreateTable_1Process(Test.java:643)
at local_project.test_0_1.Test.tHiveConnection_1Process(Test.java:498)
at local_project.test_0_1.Test.runJobInTOS(Test.java:948)
at local_project.test_0_1.Test.main(Test.java:799)
Caused by: org.apache.hive.service.cli.HiveSQLException: Error while processing statement: Cannot modify mapred.job.name at runtime. It is not in list of params that are allowed to be modified at runtime
at org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:324)
at org.apache.hive.service.cli.operation.HiveCommandOperation.runInternal(HiveCommandOperation.java:108)
at org.apache.hive.service.cli.operation.Operation.run(Operation.java:264)
at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:479)
at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:466)
at org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:315)
at org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:509)
at org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1377)
at org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1362)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
at org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56)
at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Earlier, the tHiveConnection was failing with same error. As per one of the older posts, I unchecked the Hadoop propeties from the tHiveConnect component and it worked fine. The similar properties are not available in the tHiveCreateTable component as I am using the tHiveConnection to provide connection details to tHiveCreateTable component.
Any help will be appreciated. Thanks
Anil

Similar problem as Talend (7.0.1) - Cannot modify mapred.job.name at runtime. Try to fix property hive.security.authorization.sqlstd.confwhitelist

I was able to fix the issue.
Added a property to custom hive site in Ambari as:
hive.security.authorization.sqlstd.confwhitelist.append with values
mapred.job.name|mapred.child.env|query.invoker|hive.query.name
The issue got fixed.

Related

EMR - Cannot create external Hive table using jdbcstoragehandler

I am trying to create an external hive table on postgres.
My first error got resolved as per answer in below topic:
Cannot create Hive external table using jdbcStorageHandler
But I hit another issue:
java.lang.RuntimeException: MetaException(message:org.apache.hadoop.hive.serde2.SerDeException java.lang.IllegalArgumentException: No enum constant org.apache.hive.storage.jdbc.conf.DatabaseType.postgres)
Surprisingly could not find anything on this issue so far in any forums.
Anyone encountered this error on EMR and resolved?
I resolved it finally and posting answer in case it will help someone.
Root cause was the conflicting old version of same jar file left in hive lib directory. Hence it was not picking the new jar files and rather refering old one.
After I deleted the old jar, problem is resolved.

NiFi FlowFile Repository failed to update

I´m using Apache NiFi to ingest and preprocess some CSV files, but when runing during a long time, it always fails. The error is always the same:
FlowFile Repository failed to update
Searching at logs, I see this error always:
2018-07-11 22:42:49,913 ERROR [Timer-Driven Process Thread-10] o.a.n.p.attributes.UpdateAttribute UpdateAttribute[id=c7f45dc9-ee12-31b0-8dee-6f1746b3c544] Failed to process session due to org.apache.nifi.processor.exception.ProcessException: FlowFile Repository failed to update: org.apache.nifi.processor.exception.ProcessException: FlowFile Repository failed to update
org.apache.nifi.processor.exception.ProcessException: FlowFile Repository failed to update
at org.apache.nifi.controller.repository.StandardProcessSession.commit(StandardProcessSession.java:405)
at org.apache.nifi.controller.repository.StandardProcessSession.commit(StandardProcessSession.java:336)
at org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:28)
at org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1165)
at org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:203)
at org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:117)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: **Cannot update journal file ./flowfile_repository/journals/8772495.journal because this journal has already been closed**
at org.apache.nifi.wali.LengthDelimitedJournal.checkState(LengthDelimitedJournal.java:223)
at org.apache.nifi.wali.LengthDelimitedJournal.update(LengthDelimitedJournal.java:178)
at org.apache.nifi.wali.SequentialAccessWriteAheadLog.update(SequentialAccessWriteAheadLog.java:121)
at org.apache.nifi.controller.repository.WriteAheadFlowFileRepository.updateRepository(WriteAheadFlowFileRepository.java:300)
at org.apache.nifi.controller.repository.WriteAheadFlowFileRepository.updateRepository(WriteAheadFlowFileRepository.java:257)
What makes me believe that the root cause is that Nifi Cannot update journal file ./flowfile_repository/journals/8772495.journal because this journal has already been closed**, as seen on logs file.
How can I solve this issue?
Thanks!
I ran into the same issue the other day.
When I inspected the disk space on the volume where "flowfile_repository" is located I see this
/dev/sdc1 447G 447G 24K 100% /var/proj/data2
Its 100% full.
If NiFi is having issues writing to the journal file there are a few things to check.
Are you reading in fields from your CSV that are large (greater than 64kb) and attempting to assign them to attributes? You may want to consider processing that particular field in your CSV as a separate flowfile and matching it with attributes later. See this mailing list discussion for more information.
Have you checked NiFi's configuration against the best practices listed in the administration guide? I also recommend understanding each of the Flowfile repository settings. It will allow you to ask more targeted questions.
It is likely worth updating your JVM settings to allow for larger file processing. Check out this post on Hortonworks detailing best practices for high performance systems.
In order to solve the problem you may need to tweak a couple of things. Does the flow handle the CSV in an efficient manner? Does NiFi have enough memory to do what it needs to with the data? Would it be more appropriate to handle the CSV files as records? If that concept is unfamiliar check out this post that introduces record processing in NiFi. I hope some of these resources help you get a little closer to a solution. If you have a follow up question let me know.
I encountered the same problem after running Nifi for 2 days on my Ubuntu system.
First, I ran command du -shr ./* under the Nifi folder, it turned out there are a lot of application logs there. Each of the log file is 101M. I think it is the default value for retention.
Since I don't need to keep log for that much for now, so I updated the logback.xml file in Nifi /conf folder set the application log to daily rollover.

How do I setup an InterSystems Cache Data Source in Jaspersoft Studio

I am trying to set up a new JDBC connection to an Intersystems Cache data source, and I'm struggling to know if it can even be done.
Since there was no Intersystems Cache option in the JDBC driver drop down, I added the driver string manually -> com.intersys.jdbc.CacheDriver
I then added the URL manually in the following format -> jdbc:Cache://123.123.123.123:12345/namespace
I also found the JDBC driver and have added it to the Jar File Path -> cachedb.jar
Based on the error message, I am wondering if it's even possible to connect to intersystems databases with the JDBC connector. What do you think?
When I try to connect, I get the following error:
Exception, if you want to see more information look into the details.
Reason: java.lang.ClassNotFoundException: com.intersys.jdbc.CacheDriver cannot be found by net.sf.jasperreports_6.2.1.final
The Details:
net.sf.jasperreports.engine.JRRuntimeException: java.lang.ClassNotFoundException: com.intersys.jdbc.CacheDriver cannot be found by net.sf.jasperreports_6.2.1.final
at net.sf.jasperreports.data.jdbc.JdbcDataAdapterService.getConnection(JdbcDataAdapterService.java:173)
at net.sf.jasperreports.data.jdbc.JdbcDataAdapterService.contributeParameters(JdbcDataAdapterService.java:128)
at net.sf.jasperreports.data.AbstractDataAdapterService.test(AbstractDataAdapterService.java:128)
at com.jaspersoft.studio.data.wizard.AbstractDataAdapterWizard$3.runOperations(AbstractDataAdapterWizard.java:162)
at com.jaspersoft.studio.utils.jobs.CheckedRunnableWithProgress$1.run(CheckedRunnableWithProgress.java:59)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException: com.intersys.jdbc.CacheDriver cannot be found by net.sf.jasperreports_6.2.1.final
at org.eclipse.osgi.internal.loader.BundleLoader.findClassInternal(BundleLoader.java:439)
at org.eclipse.osgi.internal.loader.BundleLoader.findClass(BundleLoader.java:352)
at org.eclipse.osgi.internal.loader.BundleLoader.findClass(BundleLoader.java:344)
at org.eclipse.osgi.internal.loader.ModuleClassLoader.loadClass(ModuleClassLoader.java:160)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at net.sf.jasperreports.engine.util.JRClassLoader.loadClassForRealName(JRClassLoader.java:174)
at net.sf.jasperreports.data.jdbc.JdbcDataAdapterService.getConnection(JdbcDataAdapterService.java:145)
... 5 more
I have asked this on the JasperReports community page, but it doesn't get much activity on there.
You say that you found cachedb.jar, but you should use cachejdbc.jar this file you can find at dev/java/lib/JDK(17|18) in InterSystems installation folder
Documentation

Error in metadata: org.apache.thrift.transport.TTransportException

What is this Error means?
" Error in metadata: org.apache.thrift.transport.TTransportException? "
In what are all the cases this error come?
I am getting this error while creating tables and while loading the data into the table.
org.apache.thrift.transport.TTransportException, Its a very generic error that message describing that the hiveserver is having a problem and suggesting you to take a look at the Hive logs. If you can able to access the full log stack and share the exact details might get the real cause of this problem. Most of the times I faced this error was like Issues with hive metadata, unable to access hive metadata,dir permissions issues,concurrency related issues, hiveserver port related problems.
You can give a try restarting and recreating your tables. or setting up hive port before starting the server might help you.
$export HIVE_PORT=10000
$hive --service hiveserver
There might be other reasons too but we can look out there once we get full log stack.

How to access WSO2 BAM's hadoop job tracker?

I am quite new to BAM and one of my hive queries is broken.
However I can't find what's wrong since the only error it gives me is
ERROR: Error while executing Hive script.Query returned non-zero code: 9, cause: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MapRedTask
I've looked around and found out that BAM is only capable of displaying that much information and for more I need to look in hadoop's job tracker. However I can't find any info on how to turn it on or access it in the BAM server.
So how do I access it/ turn it on ?
Please do not mislead with the exception. Most probably this seems to be a problem with Hive query. To get a proper idea about the problem you should send the backend console print log.
It seems like the problem is most probably with your hive query and not with hadoop job tracker. To make sure, please run of the samples[1] and check whether hive queries are executing properly. If hive queries executing without a problem and summarized results are displayed in dashboards, the problem could be with your hive query.
[1] - http://docs.wso2.org/display/BAM240/HTTPD+Logs+Analysis+Sample

Resources