We are getting errors while loading data(large volume) in Greenplum through data stage jobs.
There are multiple jobs running sequentially. there is no particular job that fails. it is randomly. if on the 1st day Job1 fails, on the 2nd day Job2 gets fail.
We have also observed that it only impact the jobs that have to load a high volume of data.
Please find the error, we have got so fer.
day 1----------
Message:
STG_DEPS_G,0: The following SQL statement failed: INSERT INTO GPCC_ET_20211114015751397_14836_2 SELECT DEPT, DEPT_NAME, BUYER, MERCH, PROFIT_CALC_TYPE, PURCHASE_TYPE, GROUP_NO, BUD_INT, BUD_MKUP, TOTAL_MARKET_AMT, MARKUP_CALC_TYPE, OTB_CALC_TYPE, MAX_AVG_COUNTER, AVG_TOLERANCE_PCT, DEPT_VAT_INCL_IND, CREATE_ID, CREATE_DATETIME FROM staging.STG_DEPS. The statement reported the following reason: [SQLCODE=08S01][Native=373,254] [IBM (DataDirect OEM)][ODBC Greenplum Wire Protocol driver][Greenplum]ERROR: http response code 501 from gpfdist (gpfdist://DDCETLMIG:8000/DDCETLMIG_14836_gpw_11_3_20211114015751366): HTTP/1.0 501 pipe is busy, close the pipe and try again (seg0 192.168.199.10:6000 pid=25824)(File url_curl.c; Line 474; Routine check_response; ) (CC_GPCommon::checkThreadStatusThrow, file CC_GPCommon.cpp, line 808)
The following SQL statement failed: INSERT INTO GPCC_ET_20211114015751397_14836_2 SELECT DEPT, DEPT_NAME, BUYER, MERCH, PROFIT_CALC_TYPE, PURCHASE_TYPE, GROUP_NO, BUD_INT, BUD_MKUP, TOTAL_MARKET_AMT, MARKUP_CALC_TYPE, OTB_CALC_TYPE, MAX_AVG_COUNTER, AVG_TOLERANCE_PCT, DEPT_VAT_INCL_IND, CREATE_ID, CREATE_DATETIME FROM staging.STG_DEPS. The statement reported the following reason: [SQLCODE=08S01][Native=373,254] [IBM (DataDirect OEM)][ODBC Greenplum Wire Protocol driver][Greenplum]ERROR: http response code 501 from gpfdist (gpfdist://DDCETLMIG:8000/DDCETLMIG_14836_gpw_11_3_20211114015751366): HTTP/1.0 501 pipe is busy, close the pipe and try again (seg0 192.168.199.10:6000 pid=25824)(File url_curl.c; Line 474; Routine check_response; ) (CC_GPCommon::checkThreadStatusThrow, file CC_GPCommon.cpp, line 808)
day 2
STG_RPM_ZONE,0: The following SQL statement failed: INSERT INTO GPCC_ET_20211114093430218_8212_0 SELECT ZONE_ID, ZONE_DISPLAY_ID, ZONE_GROUP_ID, NAME, CURRENCY_CODE, BASE_IND, LOCK_VERSION FROM STAGING.STG_RPM_ZONE. The statement reported the following reason: [SQLCODE=08S01][Native=373,254] [IBM (DataDirect OEM)][ODBC Greenplum Wire Protocol driver][Greenplum]ERROR: http response code 501 from gpfdist (gpfdist://DDCETLMIG:8004/DDCETLMIG_8212_gpw_0_0_20211114093430186): HTTP/1.0 501 pipe is busy, close the pipe and try again (seg1 192.168.199.11:6000 pid=26726)(File url_curl.c; Line 474; Routine check_response; ) (CC_GPCommon::checkThreadStatusThrow, file CC_GPCommon.cpp, line 808)
The following SQL statement failed: INSERT INTO GPCC_ET_20211114093430218_8212_0 SELECT ZONE_ID, ZONE_DISPLAY_ID, ZONE_GROUP_ID, NAME, CURRENCY_CODE, BASE_IND, LOCK_VERSION FROM STAGING.STG_RPM_ZONE. The statement reported the following reason: [SQLCODE=08S01][Native=373,254] [IBM (DataDirect OEM)][ODBC Greenplum Wire Protocol driver][Greenplum]ERROR: http response code 501 from gpfdist (gpfdist://DDCETLMIG:8004/DDCETLMIG_8212_gpw_0_0_20211114093430186): HTTP/1.0 501 pipe is busy, close the pipe and try again (seg1 192.168.199.11:6000 pid=26726)(File url_curl.c; Line 474; Routine check_response; ) (CC_GPCommon::checkThreadStatusThrow, file CC_GPCommon.cpp, line 808)
day 3
Event type:Fatal
Timestamp:11/15/2021 9:27:36 AM
Message:
SUB_CLASS,3: APT_PMMessagePort::dispatch:ERROR: header = 04F02E20SUBPROC_SUPPORT_EOW, savedDispatchPosition = 04F02E20, currentDispatchPosition_ = 04F02E1FS, currentInputPosition_ = 04F02E58, buffer_ = 04F02E20, this = 04EEA1E0
Day 4
Message:
STG_GROUPS_G,0: The following SQL statement failed: INSERT INTO GPCC_ET_20211115015013039_2400_0 SELECT GROUP_NO, GROUP_NAME, BUYER, MERCH, DIVISION, CREATE_ID, CREATE_DATETIME FROM staging.STG_GROUPS. The statement reported the following reason: [SQLCODE=08S01][Native=373,254] [IBM (DataDirect OEM)][ODBC Greenplum Wire Protocol driver][Greenplum]ERROR: http response code 501 from gpfdist (gpfdist://DDCETLMIG:8009/DDCETLMIG_2400_gpw_1_1_20211115015013023): HTTP/1.0 501 pipe is busy, close the pipe and try again (seg5 192.168.199.12:6001 pid=1167)(File url_curl.c; Line 474; Routine check_response; ) (CC_GPCommon::checkThreadStatusThrow, file CC_GPCommon.cpp, line 808)
The following SQL statement failed: INSERT INTO GPCC_ET_20211115015013039_2400_0 SELECT GROUP_NO, GROUP_NAME, BUYER, MERCH, DIVISION, CREATE_ID, CREATE_DATETIME FROM staging.STG_GROUPS. The statement reported the following reason: [SQLCODE=08S01][Native=373,254] [IBM (DataDirect OEM)][ODBC Greenplum Wire Protocol driver][Greenplum]ERROR: http response code 501 from gpfdist (gpfdist://DDCETLMIG:8009/DDCETLMIG_2400_gpw_1_1_20211115015013023): HTTP/1.0 501 pipe is busy, close the pipe and try again (seg5 192.168.199.12:6001 pid=1167)(File url_curl.c; Line 474; Routine check_response; ) (CC_GPCommon::checkThreadStatusThrow, file CC_GPCommon.cpp, line 808)
A couple of questions:
You indicated there are multiple jobs running sequentially? Does that mean there are multiple concurrent jobs each running a set of sequential jobs? Or is there one job running multiple sequential steps? If it is the first, I would check to make sure no jobs are using the same gpfdist ports.
Is DataStage connecting through a proxy server? That could potentially be causing a problem (common cause of 501 errors according to google lookup).
Finally, what is the throughput speed of the NIC or NICs being used to connect DataStage/gpfdist to Greenplum? Also, what is the interconnect speed between each segment/segment host in Greenplum? If either of these is less than 10gig, your network may not be able to support really high throughput.
When importing a file into Greenplum,one lines fails,and the whole file is not imported successfully.Is there a way can skip the wrong line and import other data into Greenplum successfully?
Here are my SQL execution and error messages:
copy cjh_test from '/gp_wkspace/outputs/base_tables/error_data_test.csv' using delimiters ',';
ERROR: invalid input syntax for integer: "FE00F760B39BD3756BCFF30000000600"
CONTEXT: COPY cjh_test, line 81, column local_city: "FE00F760B39BD3756BCFF30000000600"
Greenplum has an extension to the COPY command that lets you log errors and set up a certain amount of errors that can occur that won't stop the load. Here is an example from the documentation for the COPY command:
COPY sales FROM '/home/usr1/sql/sales_data' LOG ERRORS
SEGMENT REJECT LIMIT 10 ROWS;
That tells COPY that 10 bad rows can be ignored without stopping the load. The reject limit can be # of rows or a percentage of the load file. You can check the full syntax in psql with: \h copy
If you are loading a very large file into Greenplum, I would suggest looking at gpload or gpfdist (which also support the segment reject limit syntax). COPY is single threaded through the master server where gpload/gpfdist load the data in parallel to all segments. COPY will be faster for smaller load files and the others will be faster for millions of rows in a load file(s).
I am trying to move data from a oracle instance to postgres RDS using DMS. I am only doing a full load operation and I have disabled all the foreign keys on the target. I also made sure that the datatypes are not mismatched between columns for the same tables. I tried both 'Do Nothing' and 'Truncate' for the Target Table preparation mode and when I run the task, several tables are failing with below error messages:
[TARGET_LOAD ]E: Command failed to load data with exit error code 1, Command output: <truncated> [1020403] (csv_target.c:981)
[TARGET_LOAD ]E: Failed to wait for previous run [1020403] (csv_target.c:1578)
[TARGET_LOAD ]E: Failed to load data from csv file. [1020403] (odbc_endpoint_imp.c:5648)
[TARGET_LOAD ]E: Handling End of table 'public'.'SKEWED_VALUES' loading failed by subtask 6 thread 1 [1020403] (endpointshell.c:2416)
DMS doesn't give the correct error information and I am not able to understand what the above error messages mean.
When I use 'Drop tables on target' for the Target table preparation mode, it works but it creates the datatypes of the columns in a different way which I don't want.
Any help would be appreciated.
To troubleshoot my case, I created a copy of the task that only loaded the one problem table, and upped all the logging severities to "Detailed debug". Then I was able to see this:
[TARGET_LOAD ]D: RetCode: SQL_SUCCESS_WITH_INFO SqlState: 42622 NativeError: -1 Message: NOTICE: identifier "diagnosticinterpretationrequestdata_diagnosticinterpretationcode" will be truncated to "diagnosticinterpretationrequestdata_diagnosticinterpretationcod" (ar_odbc_stmt.c:4720)
In the RDS logs for the target DB I found:
2021-10-11 14:30:36 UTC:...:[19259]:ERROR: invalid input syntax for integer: ""
2021-10-11 14:30:36 UTC:...:[19259]:CONTEXT: COPY diagnosticinterpretationrequest, line 1, column diagnosticinterpretationrequestdata_diagnosticinterpretationcod: ""
2021-10-11 14:30:36 UTC:...:[19259]:STATEMENT: COPY "myschema"."diagnosticinterpretationrequest" FROM STDIN WITH DELIMITER ',' CSV NULL 'attNULL' ESCAPE '\'
I found that if I added a table mapping rule to explicitly rename the column to truncate the name within Postgres's limit for identifier length, then things ran ok.
{
"rule-type": "transformation",
"rule-id": "1",
"rule-name": "1",
"rule-target": "column",
"object-locator": {
"schema-name": "%",
"table-name": "%",
"column-name": "diagnosticinterpretationrequestdata_diagnosticinterpretationcode"
},
"rule-action": "rename",
"value": "diagnosticinterpretationrequestdata_diagnosticinterpretationcod",
"old-value": null
},
I am getting ORA error while doing Auxillary cloning for database from two different server using RMAN backup while using below command:-
duplicate database to "HFSDBRED" backup location '/orabackup/RMAN/HFSDBRED_BKP' nofilenamecheck set DB_FILE_NAME_CONVERT=('/oradata/HFSDBPRD/datafile','/oradata/HFSDBRED/datafile') set LOG_FILE_NAME_CONVERT=('/oradata/HFSDBPRD/onlinelog','/oradata/HFSDBRED/onlinelog','/optware/oracle/HFSDBPRD/onlinelog','/optware/oracle/HFSDBRED/onlinelog');
Error:-
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-00558: error encountered while parsing input commands
RMAN-01009: syntax error: found "identifier": expecting one of: "backup, db_file_name_convert, device, dorecover, force, from, high, logfile, nofilenamecheck, noredo, noresume, open, password, pfile, skip readonly, skip, spfile, tablespace, to restore point, undo, until restore point, until, ;"
RMAN-01008: the bad identifier was: LOG_FILE_NAME_CONVERT
RMAN-01007: at line 3 column 1 file: standard input
you should use DB & LOG_FILE_NAME_CONVERT values without paranthesis, as follows :
set DB_FILE_NAME_CONVERT='/oradata/HFSDBPRD/datafile','/oradata/HFSDBRED/datafile'
set LOG_FILE_NAME_CONVERT='/oradata/HFSDBPRD/onlinelog','/oradata/HFSDBRED/onlinelog','/optware/oracle/HFSDBPRD/onlinelog','/optware/oracle/HFSDBRED/onlinelog'
I was getting an error while trying to insert the data from a backup table.
SQL Error: ORA-30036: unable to extend segment by 8 in undo tablespace 'UND_TBS'
30036. 00000 - "unable to extend segment by %s in undo tablespace '%s'"
To solve this issue, I have creted a new empty datafile to the respective path.
After that I am getting the below given error while trying to do any select/delete operation.
Error at Command Line:1 Column:0
Error report:
SQL Error: ORA-01115: IO error reading block from file (block # )
ORA-01110: data file 82: '/pathOfDataFile/my_newly_created_datafile.dbf'
ORA-27072: File I/O error
Additional information: 7
Additional information: 16578
01115. 00000 - "IO error reading block from file %s (block # %s)"
*Cause: Device on which the file resides is probably offline
*Action: Restore access to the device
I checked the status of the file which is ONLINE.
Any idea how can I fix this error?