Pentaho: Am not able to write data from Pentaho to BigQuery - jdbc

I am using Starschema's JDBC driver to connect Pentaho to BigQuery. I am able to successfully fetch data from BigQuery into Pentaho. However I am not able to write data from Pentaho into BigQuery. There is an exception thrown while Inserting Rows into BigQuery and it seems that the operation may not be supported. How do I solve this?
Error message:
2017/10/30 14:27:43 - Table output 2.0 - ERROR (version 7.1.0.0-12, build 1 from 2017-05-16 17.18.02 by buildguy) : Because of an error, this step can't continue:
2017/10/30 14:27:43 - Table output 2.0 - ERROR (version 7.1.0.0-12, build 1 from 2017-05-16 17.18.02 by buildguy) : org.pentaho.di.core.exception.KettleException:
2017/10/30 14:27:43 - Table output 2.0 - Error inserting row into table [TableID] with values: [A], [I], [G], [1], [2016-02-18], [11], [2016-02-18-12.00.00.123456], [GG], [CB], [132], [null], [null], [null]
2017/10/30 14:27:43 - Table output 2.0 -
2017/10/30 14:27:43 - Table output 2.0 - Error inserting/updating row
2017/10/30 14:27:43 - Table output 2.0 - executeUpdate()
2017/10/30 14:27:43 - Table output 2.0 -
2017/10/30 14:27:43 - Table output 2.0 -
2017/10/30 14:27:43 - Table output 2.0 - at org.pentaho.di.trans.steps.tableoutput.TableOutput.writeToTable(TableOutput.java:385)
2017/10/30 14:27:43 - Table output 2.0 - at org.pentaho.di.trans.steps.tableoutput.TableOutput.processRow(TableOutput.java:125)
2017/10/30 14:27:43 - Table output 2.0 - at org.pentaho.di.trans.step.RunThread.run(RunThread.java:62)
2017/10/30 14:27:43 - Table output 2.0 - at java.lang.Thread.run(Unknown Source)
2017/10/30 14:27:43 - Table output 2.0 - Caused by: org.pentaho.di.core.exception.KettleDatabaseException:
2017/10/30 14:27:43 - Table output 2.0 - Error inserting/updating row
2017/10/30 14:27:43 - Table output 2.0 - executeUpdate()
2017/10/30 14:27:43 - Table output 2.0 -
2017/10/30 14:27:43 - Table output 2.0 - at org.pentaho.di.core.database.Database.insertRow(Database.java:1321)
2017/10/30 14:27:43 - Table output 2.0 - at org.pentaho.di.trans.steps.tableoutput.TableOutput.writeToTable(TableOutput.java:262)
2017/10/30 14:27:43 - Table output 2.0 - ... 3 more
2017/10/30 14:27:43 - Table output 2.0 - Caused by: net.starschema.clouddb.jdbc.BQSQLFeatureNotSupportedException: executeUpdate()
2017/10/30 14:27:43 - Table output 2.0 - at net.starschema.clouddb.jdbc.BQPreparedStatement.executeUpdate(BQPreparedStatement.java:317)
2017/10/30 14:27:43 - Table output 2.0 - at org.pentaho.di.core.database.Database.insertRow(Database.java:1288)
2017/10/30 14:27:43 - Table output 2.0 - ... 4 more
2017/10/30 14:27:43 - BigQuery_rwa-tooling - Statement canceled!
2017/10/30 14:27:43 - Simple Read Write from csv to txt - ERROR (version 7.1.0.0-12, build 1 from 2017-05-16 17.18.02 by buildguy) : Something went wrong while trying to stop the transformation: org.pentaho.di.core.exception.KettleDatabaseException:
2017/10/30 14:27:43 - Simple Read Write from csv to txt - Error cancelling statement
2017/10/30 14:27:43 - Simple Read Write from csv to txt - cancel()
2017/10/30 14:27:43 - Simple Read Write from csv to txt - ERROR (version 7.1.0.0-12, build 1 from 2017-05-16 17.18.02 by buildguy) : org.pentaho.di.core.exception.KettleDatabaseException:
2017/10/30 14:27:43 - Simple Read Write from csv to txt - Error cancelling statement
2017/10/30 14:27:43 - Simple Read Write from csv to txt - cancel()
2017/10/30 14:27:43 - Simple Read Write from csv to txt -
2017/10/30 14:27:43 - Simple Read Write from csv to txt - at org.pentaho.di.core.database.Database.cancelStatement(Database.java:750)
2017/10/30 14:27:43 - Simple Read Write from csv to txt - at org.pentaho.di.core.database.Database.cancelQuery(Database.java:732)
2017/10/30 14:27:43 - Simple Read Write from csv to txt - at org.pentaho.di.trans.steps.tableinput.TableInput.stopRunning(TableInput.java:299)
2017/10/30 14:27:43 - Simple Read Write from csv to txt - at org.pentaho.di.trans.Trans.stopAll(Trans.java:1889)
2017/10/30 14:27:43 - Simple Read Write from csv to txt - at org.pentaho.di.trans.step.BaseStep.stopAll(BaseStep.java:2915)
2017/10/30 14:27:43 - Simple Read Write from csv to txt - at org.pentaho.di.trans.steps.tableoutput.TableOutput.processRow(TableOutput.java:139)
2017/10/30 14:27:43 - Simple Read Write from csv to txt - at org.pentaho.di.trans.step.RunThread.run(RunThread.java:62)
2017/10/30 14:27:43 - Simple Read Write from csv to txt - at java.lang.Thread.run(Unknown Source)
2017/10/30 14:27:43 - Simple Read Write from csv to txt - Caused by: net.starschema.clouddb.jdbc.BQSQLFeatureNotSupportedException: cancel()
2017/10/30 14:27:43 - Simple Read Write from csv to txt - at net.starschema.clouddb.jdbc.BQStatementRoot.cancel(BQStatementRoot.java:113)
2017/10/30 14:27:43 - Simple Read Write from csv to txt - at org.pentaho.di.core.database.Database.cancelStatement(Database.java:744)
2017/10/30 14:27:43 - Simple Read Write from csv to txt - ... 7 more
2017/10/30 14:27:43 - Table output 2.0 - Signaling 'output done' to 0 output rowsets.
2017/10/30 14:27:43 - BigQuery_prID - No commit possible on database connection [BigQuery_prID]

It looks like you may be trying to do this via legacy SQL, which has no support for DML statements (INSERT/UPDATE/DELETE).
Standard SQL does support DML, but these are largely to support bulk table manipulations as opposed to row-oriented insertions; ingesting data via the use of individual DML INSERTs is not recommended. See the quotas on the DML reference documentation for more details.
You're better off using either BigQuery streaming or bulk ingestion via a load job for ingestion, but as these mechanisms are outside of the query language you may need to move beyond using a JDBC driver.

Related

HIVE MSCK ERROR - org.apache.hadoop.hive.ql.exec.DDLTask (state=\ 08S01,code=1)

I am ingesting data from one cluster to another and I am using beeline to run MSCK file from 1st cluster.
MSCK was working until feb 27 and after that I have started getting below error messages.
NFO : Executing command: MSCK REPAIR TABLE cubcus_display
INFO : Starting task [Stage-0:DDL] in serial mode
ERROR : FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask
Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask (state=\
08S01,code=1)
I have tried : https://www.ibm.com/support/pages/running-command-hive-table-results-running-msck-error-error-while-processing-statement-failed-error
But i did not work. How can we solve this?
Beeline version: 1.2.1000.2.6.5.0-292
Hive Version: 3.1.0.3.1.5.0-152

HIVE - ORC read Issue with NULL Decimal Values - java.io.EOFException: Reading BigInteger past EOF

I encountered an issue around HIVE when loading an ORC external table with NULLs inside a column that was defined as DECIMAL(31,8). It looks like hive is unable to read the ORC file after loading and can no longer view the records with a NULL inside that field. Other records in the same ORC file can be read fine.
This has only occurred recently and we have made no changes to our HIVE version. Surprisingly previous ORC files that have been loaded into the same table that have NULLs in the DECIMAL field is queriable without issue.
We are using HIVE 1.2.1. The full stack trace spat out by HIVE is below, I've replaced the actual hdfs location with
org.apache.hive.service.cli.HiveSQLException: java.io.IOException: java.io.IOException: Error reading file: <hdfs location>
at org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:352)
at org.apache.hive.service.cli.operation.OperationManager.getOperationNextRowSet(OperationManager.java:220)
at org.apache.hive.service.cli.session.HiveSessionImpl.fetchResults(HiveSessionImpl.java:685)
at org.apache.hive.service.cli.CLIService.fetchResults(CLIService.java:454)
at org.apache.hive.service.cli.thrift.ThriftCLIService.FetchResults(ThriftCLIService.java:672)
at org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1553)
at org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1538)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
at org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56)
at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: java.io.IOException: Error reading file: <hdfs location>
at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:507)
at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:414)
at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:140)
at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1670)
at org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:347)
... 13 more
Caused by: java.io.IOException: Error reading file: <hdfs location>
at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.next(RecordReaderImpl.java:1051)
at org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger$OriginalReaderPair.next(OrcRawRecordMerger.java:263)
at org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.next(OrcRawRecordMerger.java:547)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$1.next(OrcInputFormat.java:1235)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$1.next(OrcInputFormat.java:1219)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$NullKeyRecordReader.next(OrcInputFormat.java:1151)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$NullKeyRecordReader.next(OrcInputFormat.java:1137)
at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:474)
... 17 more
Caused by: java.io.EOFException: Reading BigInteger past EOF from compressed stream Stream for column 6 kind DATA position: 201 length: 201 range: 0 offset: 289 limit: 289 range 0 = 0 to 201 uncompressed: 362 to 362
at org.apache.hadoop.hive.ql.io.orc.SerializationUtils.readBigInteger(SerializationUtils.java:176)
at org.apache.hadoop.hive.ql.io.orc.TreeReaderFactory$DecimalTreeReader.next(TreeReaderFactory.java:1264)
at org.apache.hadoop.hive.ql.io.orc.TreeReaderFactory$StructTreeReader.next(TreeReaderFactory.java:2004)
at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.next(RecordReaderImpl.java:1044)
... 24 more
Set this in your code hive.fetch.task.conversion=none

Spoon PDI Data Validator Error

I am trying to validate that an assignment is correct. I can't say much, however we have internal and external users. I have an SQL Script that looks for anything other than internal on an internal assignment - result should be 0 rows. I then place this in a SQL table. After that, I've got a statement to calculate if there is an assignment error, and then I store that in a variable. Based off this, I try to validate the data with the 'Data Validator' step. Running the code manually, it should pass, however Spoon PDI is giving me the following error:
2015/05/04 13:03:19 - Data Validator.0 - ERROR (version 5.2.0.0, build 1 from 2014-09-30_19-48-28 by buildguy) : Unexpected error
2015/05/04 13:03:19 - Data Validator.0 - ERROR (version 5.2.0.0, build 1 from 2014-09-30_19-48-28 by buildguy) : org.pentaho.di.core.exception.KettleException:
2015/05/04 13:03:19 - Data Validator.0 - Correct Group/Dashboard Assignment
2015/05/04 13:03:19 - Data Validator.0 - Correct Group/Dashboard Assignment
2015/05/04 13:03:19 - Data Validator.0 -
2015/05/04 13:03:19 - Data Validator.0 - at org.pentaho.di.trans.steps.validator.Validator.processRow(Validator.java:159)
2015/05/04 13:03:19 - Data Validator.0 - at org.pentaho.di.trans.step.RunThread.run(RunThread.java:62)
2015/05/04 13:03:19 - Data Validator.0 - at java.lang.Thread.run(Unknown Source)
2015/05/04 13:03:19 - Data Validator.0 - Caused by: org.pentaho.di.trans.steps.validator.KettleValidatorException: Correct Group/Dashboard Assignment
2015/05/04 13:03:19 - Data Validator.0 - at org.pentaho.di.trans.steps.validator.Validator.validateFields(Validator.java:258)
2015/05/04 13:03:19 - Data Validator.0 - at org.pentaho.di.trans.steps.validator.Validator.processRow(Validator.java:130)
2015/05/04 13:03:19 - Data Validator.0 - ... 2 more
2015/05/04 13:03:19 - Data Validator.0 - Finished processing (I=0, O=0, R=1, W=0, U=0, E=1)
2015/05/04 13:03:19 - transformation_group_dashboard_validator - ERROR (version 5.2.0.0, build 1 from 2014-09-30_19-48-28 by buildguy) : Errors detected!
2015/05/04 13:03:19 - Spoon - The transformation has finished!!
2015/05/04 13:03:19 - transformation_group_dashboard_validator - ERROR (version 5.2.0.0, build 1 from 2014-09-30_19-48-28 by buildguy) : Errors detected!
2015/05/04 13:03:19 - transformation_group_dashboard_validator - ERROR (version 5.2.0.0, build 1 from 2014-09-30_19-48-28 by buildguy) : Errors detected!
2015/05/04 13:03:19 - transformation_group_dashboard_validator - Transformation detected one or more steps with errors.
2015/05/04 13:03:19 - transformation_group_dashboard_validator - Transformation is killing the other steps!
Is there anyway I can try to fix this?
It looks like the validator is rejecting your input(s), and according to the line in the source code, it isn't handling errors so all you get is an exception. Try creating another step linked to that validator, then right-click on the validator and choose "Define error handling..." and set up some error-related fields that the step will fill in. Also you will want to double-click on the Data Validator step and make sure the "Report all errors" and "...concatenate all errors" checkboxes are selected. That will ensure each row gets a full list of any validation errors that may have occurred.
This often happens when the validation conditions are not set the way the user intended them to be, so rows are rejected when they "should be" selected :)
I managed to fix my problem by deleting my Data Validator step and re-adding afresh one. I've noticed this with Spoon PDI a lot - the end outcome can sometimes be unpredictable and an occasional refresh of step fixes the issue.

SerDe problems with Hive 0.12 and Hadoop 2.2.0-cdh5.0.0-beta2

The title is a bit weird as I'm having difficulties narrowing down the problem. I used my solution on Hadoop 2.0.0-cdh4.4.0 and hive 0.10 without issues.
I can't create a table using this SerDe: https://github.com/rcongiu/Hive-JSON-Serde
first try:
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. org.apache.hadoop.hive.serde2.objectinspector.primitive.AbstractPrimitiveJavaObjectInspector.<init>(Lorg/apache/hadoop/hive/serde2/objectinspector/primitive/PrimitiveObjectInspectorUtils$PrimitiveTypeEntry;)V
second try:
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Could not initialize class org.openx.data.jsonserde.objectinspector.JsonObjectInspectorFactory
I can create a table with this SerDe: https://github.com/cloudera/cdh-twitter-example
I create an external table with tweets from flume. I can't do "SELECT * FROM tweets;"
FAILED: RuntimeException org.apache.hadoop.hive.ql.metadata.HiveException: Failed with exception java.lang.ClassNotFoundException: com.cloudera.hive.serde.JSONSerDejava.lang.RuntimeException: java.lang.ClassNotFoundException: com.cloudera.hive.serde.JSONSerDe
I can do SELECT id, text FROM tweets;
I can do a SELECT COUNT(*) FROM tweets;
I can't self join this table:
Execution log at: /tmp/jochen.debie/jochen.debie_20140311121313_164611a9-b0d8-4e53-9bda-f9f7ac342aaf.log
2014-03-11 12:13:30 Starting to launch local task to process map join; maximum memory = 257294336
Execution failed with exit status: 2
Obtaining error information
Task failed!
Task ID:
Stage-5
mentioned execution log:
2014-03-11 12:13:30,331 ERROR mr.MapredLocalTask (MapredLocalTask.java:executeFromChildJVM(324)) - Hive Runtime Error: Map local work failed
org.apache.hadoop.hive.ql.metadata.HiveException: Failed with exception java.lang.ClassNotFoundException: com.cloudera.hive.serde.JSONSerDejava.lang.RuntimeException: java.lang.ClassNotFoundException: com.cloudera.hive.serde.JSONSerDe
Does anyone know how to fix this or at least show me where the problem is?
EDIT: Can it be a problem that I built the serde on a Hadoop 2.0.0-cdh4.4.0 and hive 0.10?
From what I've seen, Hive-.11+ has a bug in join with custom SerDe.
https://github.com/Esri/gis-tools-for-hadoop/issues/9
You might try the workaround of copying the JAR file containing the SerDe class, to $HIVE_HOME/lib .
(I see in your question you got ClassNotFoundException both in join and in other cases; so far the times I have encountered such were all with join.)
[Edit] Another workaround is to use HADOOP_CLASSPATH:
env HADOOP_CLASSPATH=some.jar:other.jar hive ...
[Edit] The work around applies to Hive versions 0.11 and 0.12; then 0.13 and above contain the fix for HIVE-6670.

Reading hive table using Pig script

I am trying to read hive table using PIG script but when I run a pig code to read a table in hive its giving me following error:
2014-02-12 15:48:36,143 [main] WARN org.apache.hadoop.hive.conf.HiveConf
-hive-site.xml not found on CLASSPATH 2014-02-12 15:49:10,781 [main] ERROR
org.apache.pig.tools.grunt.Grunt - ERROR 2997: Unable to recreate
exception from backed error: Error: Found class
org.apache.hadoop.mapreduce.TaskAttemptContext, but interface was expected
(Ignore newlines and whitespace added for readability)
Hadoop version
1.1.1
Hive version
0.9.0
Pig version
0.10.0
Pig code
a = LOAD '/user/hive/warehouse/test' USING
org.apache.pig.piggybank.storage.HiveColumnarLoader('name string');
Is it due to some version mismatch ?
Why can't you use hcatalog to access hive metadata in pig?
Check this for an example

Resources