monetdb failed to bulk load 42000 syntax error - monetdb

I'm trying to bulk load some data into monetdb.
I followed this example.
It works for my test data. But when I use the data from production environment, I get exceptions.
Caused by: java.lang.Exception: 42000!syntax error, unexpected IDENT, expecting DELIMITERS in: "copy into dm.fact_sem_keyword_collection from stdin using delimters"
25005!current transaction is aborted (please ROLLBACK)
at com.lietou.bi.dw.task.common.dataexchange.MonetDBBulkLoadWriter.flush(MonetDBBulkLoadWriter.java:140)
... 12 more
and I found the following log from the debug log.
20150804x_m_0001323100000001255480000000001310100152015-08-05 10:28:44
20150804x_z_0002000000000001000000000000000002015-08-05 10:28:44
20150804xn_cd_01000000000002000000000000000002015-08-05 10:28:44
20150804z_s_0002000000000001000000000000000002015-08-05 10:28:44
RD 1438779576672: read final block: 190 bytes
RX 1438779576672: !42000!syntax error, unexpected IDENT, expecting DELIMITERS in: "copy into dm.fact_sem_keyword_collection from stdin using delimters"
!25005!current transaction is aborted (please ROLLBACK)
RD 1438779576672: inserting prompt
I think there must be some bad data, but I don't know how to find more details to help me locate it.
And there is one more thing. Monetdb seems to read data using a buffer size of 8190 bytes. But right before the exception, there is a write final block of different size. Logs are showed below. What does this mean?
...
TD 1438779576129: write block: 8190 bytes
TX 1438779576129:
...
TD 1438779576137: write block: 8190 bytes
TX 1438779576137:
...
TD 1438779576137: write final block: 5921 bytes
TX 1438779576137:
...

Related

Facing Pipeline Busy issue while loading data in Greenplum using data stage

We are getting errors while loading data(large volume) in Greenplum through data stage jobs.
There are multiple jobs running sequentially. there is no particular job that fails. it is randomly. if on the 1st day Job1 fails, on the 2nd day Job2 gets fail.
We have also observed that it only impact the jobs that have to load a high volume of data.
Please find the error, we have got so fer.
day 1----------
Message:
STG_DEPS_G,0: The following SQL statement failed: INSERT INTO GPCC_ET_20211114015751397_14836_2 SELECT DEPT, DEPT_NAME, BUYER, MERCH, PROFIT_CALC_TYPE, PURCHASE_TYPE, GROUP_NO, BUD_INT, BUD_MKUP, TOTAL_MARKET_AMT, MARKUP_CALC_TYPE, OTB_CALC_TYPE, MAX_AVG_COUNTER, AVG_TOLERANCE_PCT, DEPT_VAT_INCL_IND, CREATE_ID, CREATE_DATETIME FROM staging.STG_DEPS. The statement reported the following reason: [SQLCODE=08S01][Native=373,254] [IBM (DataDirect OEM)][ODBC Greenplum Wire Protocol driver][Greenplum]ERROR: http response code 501 from gpfdist (gpfdist://DDCETLMIG:8000/DDCETLMIG_14836_gpw_11_3_20211114015751366): HTTP/1.0 501 pipe is busy, close the pipe and try again (seg0 192.168.199.10:6000 pid=25824)(File url_curl.c; Line 474; Routine check_response; ) (CC_GPCommon::checkThreadStatusThrow, file CC_GPCommon.cpp, line 808)
The following SQL statement failed: INSERT INTO GPCC_ET_20211114015751397_14836_2 SELECT DEPT, DEPT_NAME, BUYER, MERCH, PROFIT_CALC_TYPE, PURCHASE_TYPE, GROUP_NO, BUD_INT, BUD_MKUP, TOTAL_MARKET_AMT, MARKUP_CALC_TYPE, OTB_CALC_TYPE, MAX_AVG_COUNTER, AVG_TOLERANCE_PCT, DEPT_VAT_INCL_IND, CREATE_ID, CREATE_DATETIME FROM staging.STG_DEPS. The statement reported the following reason: [SQLCODE=08S01][Native=373,254] [IBM (DataDirect OEM)][ODBC Greenplum Wire Protocol driver][Greenplum]ERROR: http response code 501 from gpfdist (gpfdist://DDCETLMIG:8000/DDCETLMIG_14836_gpw_11_3_20211114015751366): HTTP/1.0 501 pipe is busy, close the pipe and try again (seg0 192.168.199.10:6000 pid=25824)(File url_curl.c; Line 474; Routine check_response; ) (CC_GPCommon::checkThreadStatusThrow, file CC_GPCommon.cpp, line 808)
day 2
STG_RPM_ZONE,0: The following SQL statement failed: INSERT INTO GPCC_ET_20211114093430218_8212_0 SELECT ZONE_ID, ZONE_DISPLAY_ID, ZONE_GROUP_ID, NAME, CURRENCY_CODE, BASE_IND, LOCK_VERSION FROM STAGING.STG_RPM_ZONE. The statement reported the following reason: [SQLCODE=08S01][Native=373,254] [IBM (DataDirect OEM)][ODBC Greenplum Wire Protocol driver][Greenplum]ERROR: http response code 501 from gpfdist (gpfdist://DDCETLMIG:8004/DDCETLMIG_8212_gpw_0_0_20211114093430186): HTTP/1.0 501 pipe is busy, close the pipe and try again (seg1 192.168.199.11:6000 pid=26726)(File url_curl.c; Line 474; Routine check_response; ) (CC_GPCommon::checkThreadStatusThrow, file CC_GPCommon.cpp, line 808)
The following SQL statement failed: INSERT INTO GPCC_ET_20211114093430218_8212_0 SELECT ZONE_ID, ZONE_DISPLAY_ID, ZONE_GROUP_ID, NAME, CURRENCY_CODE, BASE_IND, LOCK_VERSION FROM STAGING.STG_RPM_ZONE. The statement reported the following reason: [SQLCODE=08S01][Native=373,254] [IBM (DataDirect OEM)][ODBC Greenplum Wire Protocol driver][Greenplum]ERROR: http response code 501 from gpfdist (gpfdist://DDCETLMIG:8004/DDCETLMIG_8212_gpw_0_0_20211114093430186): HTTP/1.0 501 pipe is busy, close the pipe and try again (seg1 192.168.199.11:6000 pid=26726)(File url_curl.c; Line 474; Routine check_response; ) (CC_GPCommon::checkThreadStatusThrow, file CC_GPCommon.cpp, line 808)
day 3
Event type:Fatal
Timestamp:11/15/2021 9:27:36 AM
Message:
SUB_CLASS,3: APT_PMMessagePort::dispatch:ERROR: header = 04F02E20SUBPROC_SUPPORT_EOW, savedDispatchPosition = 04F02E20, currentDispatchPosition_ = 04F02E1FS, currentInputPosition_ = 04F02E58, buffer_ = 04F02E20, this = 04EEA1E0
Day 4
Message:
STG_GROUPS_G,0: The following SQL statement failed: INSERT INTO GPCC_ET_20211115015013039_2400_0 SELECT GROUP_NO, GROUP_NAME, BUYER, MERCH, DIVISION, CREATE_ID, CREATE_DATETIME FROM staging.STG_GROUPS. The statement reported the following reason: [SQLCODE=08S01][Native=373,254] [IBM (DataDirect OEM)][ODBC Greenplum Wire Protocol driver][Greenplum]ERROR: http response code 501 from gpfdist (gpfdist://DDCETLMIG:8009/DDCETLMIG_2400_gpw_1_1_20211115015013023): HTTP/1.0 501 pipe is busy, close the pipe and try again (seg5 192.168.199.12:6001 pid=1167)(File url_curl.c; Line 474; Routine check_response; ) (CC_GPCommon::checkThreadStatusThrow, file CC_GPCommon.cpp, line 808)
The following SQL statement failed: INSERT INTO GPCC_ET_20211115015013039_2400_0 SELECT GROUP_NO, GROUP_NAME, BUYER, MERCH, DIVISION, CREATE_ID, CREATE_DATETIME FROM staging.STG_GROUPS. The statement reported the following reason: [SQLCODE=08S01][Native=373,254] [IBM (DataDirect OEM)][ODBC Greenplum Wire Protocol driver][Greenplum]ERROR: http response code 501 from gpfdist (gpfdist://DDCETLMIG:8009/DDCETLMIG_2400_gpw_1_1_20211115015013023): HTTP/1.0 501 pipe is busy, close the pipe and try again (seg5 192.168.199.12:6001 pid=1167)(File url_curl.c; Line 474; Routine check_response; ) (CC_GPCommon::checkThreadStatusThrow, file CC_GPCommon.cpp, line 808)
A couple of questions:
You indicated there are multiple jobs running sequentially? Does that mean there are multiple concurrent jobs each running a set of sequential jobs? Or is there one job running multiple sequential steps? If it is the first, I would check to make sure no jobs are using the same gpfdist ports.
Is DataStage connecting through a proxy server? That could potentially be causing a problem (common cause of 501 errors according to google lookup).
Finally, what is the throughput speed of the NIC or NICs being used to connect DataStage/gpfdist to Greenplum? Also, what is the interconnect speed between each segment/segment host in Greenplum? If either of these is less than 10gig, your network may not be able to support really high throughput.

Import file failed to greenplum because of one line of data on navicate

When importing a file into Greenplum,one lines fails,and the whole file is not imported successfully.Is there a way can skip the wrong line and import other data into Greenplum successfully?
Here are my SQL execution and error messages:
copy cjh_test from '/gp_wkspace/outputs/base_tables/error_data_test.csv' using delimiters ',';
ERROR: invalid input syntax for integer: "FE00F760B39BD3756BCFF30000000600"
CONTEXT: COPY cjh_test, line 81, column local_city: "FE00F760B39BD3756BCFF30000000600"
Greenplum has an extension to the COPY command that lets you log errors and set up a certain amount of errors that can occur that won't stop the load. Here is an example from the documentation for the COPY command:
COPY sales FROM '/home/usr1/sql/sales_data' LOG ERRORS
SEGMENT REJECT LIMIT 10 ROWS;
That tells COPY that 10 bad rows can be ignored without stopping the load. The reject limit can be # of rows or a percentage of the load file. You can check the full syntax in psql with: \h copy
If you are loading a very large file into Greenplum, I would suggest looking at gpload or gpfdist (which also support the segment reject limit syntax). COPY is single threaded through the master server where gpload/gpfdist load the data in parallel to all segments. COPY will be faster for smaller load files and the others will be faster for millions of rows in a load file(s).

AWS DMS - Oracle to PG RDS full load operation error - failed to load data from csv file

I am trying to move data from a oracle instance to postgres RDS using DMS. I am only doing a full load operation and I have disabled all the foreign keys on the target. I also made sure that the datatypes are not mismatched between columns for the same tables. I tried both 'Do Nothing' and 'Truncate' for the Target Table preparation mode and when I run the task, several tables are failing with below error messages:
[TARGET_LOAD ]E: Command failed to load data with exit error code 1, Command output: <truncated> [1020403] (csv_target.c:981)
[TARGET_LOAD ]E: Failed to wait for previous run [1020403] (csv_target.c:1578)
[TARGET_LOAD ]E: Failed to load data from csv file. [1020403] (odbc_endpoint_imp.c:5648)
[TARGET_LOAD ]E: Handling End of table 'public'.'SKEWED_VALUES' loading failed by subtask 6 thread 1 [1020403] (endpointshell.c:2416)
DMS doesn't give the correct error information and I am not able to understand what the above error messages mean.
When I use 'Drop tables on target' for the Target table preparation mode, it works but it creates the datatypes of the columns in a different way which I don't want.
Any help would be appreciated.
To troubleshoot my case, I created a copy of the task that only loaded the one problem table, and upped all the logging severities to "Detailed debug". Then I was able to see this:
[TARGET_LOAD ]D: RetCode: SQL_SUCCESS_WITH_INFO SqlState: 42622 NativeError: -1 Message: NOTICE: identifier "diagnosticinterpretationrequestdata_diagnosticinterpretationcode" will be truncated to "diagnosticinterpretationrequestdata_diagnosticinterpretationcod" (ar_odbc_stmt.c:4720)
In the RDS logs for the target DB I found:
2021-10-11 14:30:36 UTC:...:[19259]:ERROR: invalid input syntax for integer: ""
2021-10-11 14:30:36 UTC:...:[19259]:CONTEXT: COPY diagnosticinterpretationrequest, line 1, column diagnosticinterpretationrequestdata_diagnosticinterpretationcod: ""
2021-10-11 14:30:36 UTC:...:[19259]:STATEMENT: COPY "myschema"."diagnosticinterpretationrequest" FROM STDIN WITH DELIMITER ',' CSV NULL 'attNULL' ESCAPE '\'
I found that if I added a table mapping rule to explicitly rename the column to truncate the name within Postgres's limit for identifier length, then things ran ok.
{
"rule-type": "transformation",
"rule-id": "1",
"rule-name": "1",
"rule-target": "column",
"object-locator": {
"schema-name": "%",
"table-name": "%",
"column-name": "diagnosticinterpretationrequestdata_diagnosticinterpretationcode"
},
"rule-action": "rename",
"value": "diagnosticinterpretationrequestdata_diagnosticinterpretationcod",
"old-value": null
},

RMAN error during auxillary backup

I am getting ORA error while doing Auxillary cloning for database from two different server using RMAN backup while using below command:-
duplicate database to "HFSDBRED" backup location '/orabackup/RMAN/HFSDBRED_BKP' nofilenamecheck set DB_FILE_NAME_CONVERT=('/oradata/HFSDBPRD/datafile','/oradata/HFSDBRED/datafile') set LOG_FILE_NAME_CONVERT=('/oradata/HFSDBPRD/onlinelog','/oradata/HFSDBRED/onlinelog','/optware/oracle/HFSDBPRD/onlinelog','/optware/oracle/HFSDBRED/onlinelog');
Error:-
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-00558: error encountered while parsing input commands
RMAN-01009: syntax error: found "identifier": expecting one of: "backup, db_file_name_convert, device, dorecover, force, from, high, logfile, nofilenamecheck, noredo, noresume, open, password, pfile, skip readonly, skip, spfile, tablespace, to restore point, undo, until restore point, until, ;"
RMAN-01008: the bad identifier was: LOG_FILE_NAME_CONVERT
RMAN-01007: at line 3 column 1 file: standard input
you should use DB & LOG_FILE_NAME_CONVERT values without paranthesis, as follows :
set DB_FILE_NAME_CONVERT='/oradata/HFSDBPRD/datafile','/oradata/HFSDBRED/datafile'
set LOG_FILE_NAME_CONVERT='/oradata/HFSDBPRD/onlinelog','/oradata/HFSDBRED/onlinelog','/optware/oracle/HFSDBPRD/onlinelog','/optware/oracle/HFSDBRED/onlinelog'

SQL Error: ORA-01115: IO error reading block from file (block # )

I was getting an error while trying to insert the data from a backup table.
SQL Error: ORA-30036: unable to extend segment by 8 in undo tablespace 'UND_TBS'
30036. 00000 - "unable to extend segment by %s in undo tablespace '%s'"
To solve this issue, I have creted a new empty datafile to the respective path.
After that I am getting the below given error while trying to do any select/delete operation.
Error at Command Line:1 Column:0
Error report:
SQL Error: ORA-01115: IO error reading block from file (block # )
ORA-01110: data file 82: '/pathOfDataFile/my_newly_created_datafile.dbf'
ORA-27072: File I/O error
Additional information: 7
Additional information: 16578
01115. 00000 - "IO error reading block from file %s (block # %s)"
*Cause: Device on which the file resides is probably offline
*Action: Restore access to the device
I checked the status of the file which is ONLINE.
Any idea how can I fix this error?

Resources