How to distribute data from GridDB to different data warehouses? - jdbc

I am currently collecting data from different PLC devices to GridDB for storage. The format of the data is as follows:
D_NAME
DATA
MSG
Siemens
2021/10/4
acbdfg
Omron
2021/10/4
ponged
It is necessary to distribute the row whose device name is Siemens to Oralce,
and distribute the row whose device name is Omron to PostgreSQL,
And realize the automatic execution of data distribution at regular intervals every day
I'm currently trying to use PDI+windows schtasks,
Filter the data through the following SQL statements:
select * from t2021104 where D_NAME='Siemens'
At the same time, I also checked the availability of the database driver, all of which can be connected and used normally.
But unfortunately, I got the following error message:
2022/12/03 08:53:32 - OUT-Griddb.0 - Caused by: java.sql.SQLFeatureNotSupportedException: [147001:JDBC_NOT_SUPPORTED] Currently not supported
2022/12/03 08:53:32 - OUT-Griddb.0 - at com.toshiba.mwcloud.gs.sql.internal.SQLErrorUtils.errorNotSupportedFeature(SQLErrorUtils.java:241)
2022/12/03 08:53:32 - OUT-Griddb.0 - at com.toshiba.mwcloud.gs.sql.internal.SQLErrorUtils.errorNotSupported(SQLErrorUtils.java:204)
2022/12/03 08:53:32 - OUT-Griddb.0 - at com.toshiba.mwcloud.gs.sql.internal.SQLPreparedStatement.setBigDecimal(SQLPreparedStatement.java:148)
2022/12/03 08:53:32 - OUT-Griddb.0 - at org.pentaho.di.core.row.value.ValueMetaBase.setPreparedStatementValue(ValueMetaBase.java:5472)
It seems that JDBC does not support the result, but I continued to check JDBC, and both the program and DBeaver can execute DDL and DML statements normally.
I tried to write the data in postgresq to GridDB and found that the execution was successful without any error message
I checked GridDB's log and didn't find any error log

Related

Error in SQL statement Arithmetic operation resulted in an overflow

I am trying to connect ODBC 64bit Driver to allow me execute query to extract data from JDE 8.12
After building open SQL Connection and execute simple query it appear an error "Error in SQL statement Arithmetic operation resulted in an overflow."
Could you please advise what is missing in order to allow the query to be execute?
Steps which I did:
1 - I selected Microsot OLE DB Provider for ODBC Driver
2- Selected Driver (Driver - Oracle in OraClient 11g64_home1)
3 -Test The connection and show successful
4 - Build simple query to test the Flow
5- After run the Flow I get error
"Error in SQL statement Arithmetic operation resulted in an overflow."
After successful connection , I was expecting simple query to be executed without issue.
It is possible that your Orale client is 32-bit only and the application using it is not x86 compiled.
Also, it's possible that you have a numeric field of certain limitation and a value is being computed arithmetically and then attempted to be stored into the field, overflowing its bounds.

Failure loading parquet in Synapse Analytics - INT mapped as UTF8

We have an on-premise Oracle database from which we need to extract data and store this in a Synapse dedicated pool. I have created a Synapse pipeline which first copies the data from Oracle to a datalake in a parquet file, which should then be imported into Synapse using a second copy task.
The data from Oracle is extracted through a dynamically created query. This query has 2 hard-coded INT values which are generated at runtime. The query runs fine and the parquet file is created correctly, but if I use polybase or copy command to import the file to Synapse it fails with the following error:
"errorCode": "2200",
"message": "ErrorCode=UserErrorSqlDWCopyCommandError,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=SQL DW Copy Command operation failed with error 'HdfsBridge::recordReaderFillBuffer - Unexpected error encountered filling record reader buffer: ClassCastException: ',Source=Microsoft.DataTransfer.ClientLibrary,''Type=System.Data.SqlClient.SqlException,Message=HdfsBridge::recordReaderFillBuffer - Unexpected error encountered filling record reader buffer: ClassCastException: ,Source=.Net SqlClient Data Provider,SqlErrorNumber=106000,Class=16,ErrorCode=-2146232060,State=1,Errors=[{Class=16,Number=106000,State=1,Message=HdfsBridge::recordReaderFillBuffer - Unexpected error encountered filling record reader buffer: ClassCastException: ,},],'",
Bulk insert works but is less efficient on large quantities of data so I don't want to use that.
The mapping for the copy activities is created dynamically based on the target database table definition. However, when I created a separate copy task and import the mapping to check what is going on, I noticed that the 2 INT columns are mapped as UTF8 on the parquet source side. The sink table is INT32. When I exclude both columns the copy task completes successfully. It seems that the copy activity fails because it cannot implicitly cast a string to an integer.
The 2 columns are explicitly cast as integers in the Oracle query that is the source for the parquet file.
SELECT t.*
, CAST(419 AS INT) AS "Execution_id"
, CAST(4832 AS INT) AS "Task_id"
, TO_DATE('2022-07-05 14:40:34', 'YYYY-MM-DD HH24:MI:SS') AS "ProcessedDTS"
, t.DEMUTDT AS "EffectiveDTS"
FROM CBO.DRKASTR t
WHERE DEMUTDT >= TO_DATE('2022-07-05 13:37:35', 'YYYY-MM-DD HH24:MI:SS');
Adding an explicit mapping for Oracle to parquet mapping them as INT also doesn't solve the problem.
How do I prevent these 2 columns from being interpreted as integers instead of strings?!
We ended up resolving this by first importing the data as strings in the database and casting to the correct database during further processing.

SSIS For Loop ODBC

I’m extracting data from a table in Oracle.
I have an ODBC connection manager to the Oracle database and the query for extraction should include a where clause because the table contain transactional data and there is no reason to extract it all every time.
I want initialize the table once and do it in with a For Loop which will iterate the whole table.
Since it’s an ODBC connection I can’t just put a where clause because I need to use a variable hence I realized I need to parameterize the DataFlow task and write my query at the sqlcommand property containing the ODBC source.
The property value is:
SELECT *
FROM DDC.DDC_SALES_TBL
WHERE trunc(CALDAY) between to_date('"+ #[User::vstart]+"','MM/DD/YYYY')
and to_date('"+ #[User::vstop]+"','MM/DD/YYYY')
Where the #vstart and #vstop are variables containing the ‘from/to’ dates to be extracted based on a DATEADD function and another variable (#vcount) which supposed to be the iterator as follows:
(DT_WSTR, 2) MONTH( DATEADD( "day", #[User::vcount] , GETDATE() ) )+"/"+
(DT_WSTR, 2) DAY( DATEADD( "day", #[User::vcount] , GETDATE() ) )+"/"+
(DT_WSTR, 4) YEAR( DATEADD( "day", #[User::vcount] , GETDATE() ) )
What’s happening is that the first iteration works fine but the second one generates an error and the package fails.
I marked the variable as EvaluateAsExpression=True
I also marked the DelayValidation=True in both the For Loop and the DataFlow tasks.
The errors are:
(1)Data Flow Task:Error: SQLSTATE: HY010, Message: [Microsoft][ODBC Driver Manager] Function sequence error;
(2) Data Flow Task:Error: SSIS Error Code DTS_E_INDUCEDTRANSFORMFAILUREONERROR. The "ODBC Source.Outputs[ODBC Source Output]" failed because error code 0xC020F450 occurred, and the error row disposition on "ODBC Source" specifies failure on error. An error occurred on the specified object of the specified component. There may be error messages posted before this with more information about the failure.
(3) Data Flow Task:Error: SSIS Error Code DTS_E_PRIMEOUTPUTFAILED. The PrimeOutput method on ODBC Source returned error code 0xC0209029. The component returned a failure code when the pipeline engine called PrimeOutput(). The meaning of the failure code is defined by the component, but the error is fatal and the pipeline stopped executing. There may be error messages posted before this with more information about the failure.
Please assist.
I don't know why initially i didn't use OLEDB, as I thought it doesn't work.
What i tried was to use create an OLEDB via oracle driver and the connection manager worked so i used it.
As this way you can parameterize the source directly and the loop worked just fine.
Don't know what cause the conflict with the OBDC source but that's my workaround.
I didn't find a way to setup the sqlcommand property in ODBC source and using it in a loop which should change the the command every iteration. It crashed after the first iteration ni matter what i tried.
Thanks,
I was having the same issue when using Oracle Source, updating the Attunity Connectors for Oracle as well as the OLEDB driver for SQL Server worked to fix the problem.

Hortonworks Hive ODBC Driver DB-240000

DB-240000 ODBC error: [Hortonworks][Hardy] (80) Syntax or semantic
analysis error thrown in server while executing query.
Error message from server: Error while compiling statement: FAILED:
ParseException line 1:12 cannot recognize input near 'ALL_ROWS'
'*' '/' in hint name SQLState: 37000
Sample query
WDB-200001 SQL statement 'SELECT /*+ ALL_ROWS */ A.test FROM table A' could not be executed.
Syntax looks right as per documentation (https://docs.oracle.com/cd/E11882_01/server.112/e41084/sql_elements006.htm#SQLRF51108)
Or is there a missing param on the odbc configuration?
https://hortonworks.com/wp-content/uploads/2015/10/Hortonworks-Hive-ODBC-Driver-User-Guide.pdf
Use Native Query Key Name Default Value Required UseNativeQuery Clear
(0) No Architecting the Future of Big Data Hortonworks Inc. Page 71
Description When this option is enabled (1), the driver does not
transform the queries emitted by an application, so the native query
is used. When this option is disabled (0), the driver transforms the
queries emitted by an application and converts them into an equivalent
from in HiveQL. Note: If the application is Hive-aware and already
emits Hive
Could this be an issue with HDP versioning?
Is there a missing Param
in the ODBC connection string?

ERROR: CANNOT PARALLELIZE AN UPDATE STATEMENT THAT UPDATES THE DISTRIBUTION COLUMNS

When trying to copy data from source (MSSQLSERVER) TO target (greenplum database) using talend ETL server.
Description: When executing an UPDATE statement to GreenPlum, the mentioned error is thrown.
GIVEN
No of records fetching to target is ~ 0.3 million
Update is failing with error
ERROR: CANNOT PARALLELIZE AN UPDATE STATEMENT THAT UPDATES THE DISTRIBUTION COLUMNS current transaction is aborted, commands ignored until end of transaction block
Any help on it would be much appreciated
Solution i tried :
When ON_ERROR_ROLLBACK is enabled, psql will issue a SAVEPOINT before every command you send to greenplum
gpadmin=# \set ON_ERROR_ROLLBACK interactive
But after that we tried running the same Job and it did not solved the problem.
1) Update is not supported in Hawq.
2) Update is only supported to heap but not AO table in GPDB.
GPDB/HAWQ are used as data warehouse/BI and data exploration purpose.

Resources