Sqoop is failing to get data from teradata with java.IO exception - hadoop

Here is my sqoop import that I'm using to pull data from Teradata
sqoop import -libjars jars --driver drivers --connect connection_url -m 1 --hive-overwrite --hive-import --hive-database hivedatabase --hive-table hivetable --target-dir '/user/hive/warehouse/database.db/table_name' --as-parquetfile --query "select c1,c2,c3, to_char(SOURCE_ACTIVATION_DT,'YYYY-MM-DD HH24:MI:SS') as SOURCE_ACTIVATION_DT,to_char(SOURCE_DEACTIVATION_DT,'YYYY-MM-DD HH24:MI:SS') as SOURCE_DEACTIVATION_DT,to_char(EFF_DT,'YYYY-MM-DD HH24:MI:SS') as EFF_DT,to_char(EXP_DT,'YYYY-MM-DD HH24:MI:SS') as EXP_DT,to_char(SYS_UPDATE_DTM,'YYYY-MM-DD HH24:MI:SS') as SYS_UPDATE_DTM,to_char(SYS_LOAD_DTM,'YYYY-MM-DD HH24:MI:SS') as SYS_LOAD_DTM from source_schema.table_name WHERE to_char(SYS_UPDATE_DTM,'YYYY-MM-DD HH24:MI:SS')> '2017-03-30 10:00:00' OR to_char(SYS_LOAD_DTM,'YYYY-MM-DD HH24:MI:SS') > '2017-03-30 10:00:00' AND \$CONDITIONS"
Below is the error I'm getting, this was running fine for two days and started returning the below error recently.
17/03/29 20:07:53 INFO mapreduce.Job: map 0% reduce 0%
17/03/29 20:56:46 INFO mapreduce.Job: Task Id : attempt_1487033963691_263120_m_000000_0, Status : FAILED
Error: java.io.IOException: SQLException in nextKeyValue
at org.apache.sqoop.mapreduce.db.DBRecordReader.nextKeyValue(DBRecordReader.java:277)
at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:556)
at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.sql.SQLException: [Teradata JDBC Driver] [TeraJDBC 15.10.00.14] [Error 1005] [SQLState HY000] Unexpected parcel kind received: 9
at com.teradata.jdbc.jdbc_4.util.ErrorFactory.makeDriverJDBCException(ErrorFactory.java:94)
at com.teradata.jdbc.jdbc_4.util.ErrorFactory.makeDriverJDBCException(ErrorFactory.java:69)
at com.teradata.jdbc.jdbc_4.statemachine.ReceiveRecordSubState.action(ReceiveRecordSubState.java:195)
at com.teradata.jdbc.jdbc_4.statemachine.StatementReceiveState.subStateMachine(StatementReceiveState.java:311)
at com.teradata.jdbc.jdbc_4.statemachine.StatementReceiveState.action(StatementReceiveState.java:200)
at com.teradata.jdbc.jdbc_4.statemachine.StatementController.runBody(StatementController.java:137)
at com.teradata.jdbc.jdbc_4.statemachine.PreparedStatementController.run(PreparedStatementController.java:46)
at com.teradata.jdbc.jdbc_4.statemachine.StatementController.fetchRows(StatementController.java:360)
at com.teradata.jdbc.jdbc_4.TDResultSet.goToRow(TDResultSet.java:374)
at com.teradata.jdbc.jdbc_4.TDResultSet.next(TDResultSet.java:657)
at org.apache.sqoop.mapreduce.db.DBRecordReader.nextKeyValue(DBRecordReader.java:237)
... 12 more
When i googled around I've seen people getting same errors for different errors, I know this is something related to the time i'm using in where clause, but not sure what exactly i have to change.
Thanks in advance...!!

Sqoop uses $CONDITIONS to fetch metadata and data.
Metadata - It replaces $CONDITIONS with 1=0. So, no data will be fetched using this condition but only metadata.
Data in case of 1 mapper: It replaces $CONDITIONS with 1=1. So, all the data is fetched.
Data in case of multiple mapper: It replaces $CONDITIONS with some range condition.
Try these queries in JDBC client:
select c1,c2,c3, to_char(SOURCE_ACTIVATION_DT,'YYYY-MM-DD HH24:MI:SS') as SOURCE_ACTIVATION_DT,to_char(SOURCE_DEACTIVATION_DT,'YYYY-MM-DD HH24:MI:SS') as SOURCE_DEACTIVATION_DT,to_char(EFF_DT,'YYYY-MM-DD HH24:MI:SS') as EFF_DT,to_char(EXP_DT,'YYYY-MM-DD HH24:MI:SS') as EXP_DT,to_char(SYS_UPDATE_DTM,'YYYY-MM-DD HH24:MI:SS') as SYS_UPDATE_DTM,to_char(SYS_LOAD_DTM,'YYYY-MM-DD HH24:MI:SS') as SYS_LOAD_DTM from source_schema.table_name WHERE to_char(SYS_UPDATE_DTM,'YYYY-MM-DD HH24:MI:SS')> '2017-03-30 10:00:00' OR to_char(SYS_LOAD_DTM,'YYYY-MM-DD HH24:MI:SS') > '2017-03-30 10:00:00' AND 1=0"
select c1,c2,c3, to_char(SOURCE_ACTIVATION_DT,'YYYY-MM-DD HH24:MI:SS') as SOURCE_ACTIVATION_DT,to_char(SOURCE_DEACTIVATION_DT,'YYYY-MM-DD HH24:MI:SS') as SOURCE_DEACTIVATION_DT,to_char(EFF_DT,'YYYY-MM-DD HH24:MI:SS') as EFF_DT,to_char(EXP_DT,'YYYY-MM-DD HH24:MI:SS') as EXP_DT,to_char(SYS_UPDATE_DTM,'YYYY-MM-DD HH24:MI:SS') as SYS_UPDATE_DTM,to_char(SYS_LOAD_DTM,'YYYY-MM-DD HH24:MI:SS') as SYS_LOAD_DTM from source_schema.table_name WHERE to_char(SYS_UPDATE_DTM,'YYYY-MM-DD HH24:MI:SS')> '2017-03-30 10:00:00' OR to_char(SYS_LOAD_DTM,'YYYY-MM-DD HH24:MI:SS') > '2017-03-30 10:00:00' AND 1=1"
If these are not working, your sqoop command with this query can never run.

Related

How can I hardcode date into Oracle query in SSIS

I am very new to Oracle database and I am trying to hardcode the date (2020-06-30 0:00:00) into the query below, however I get the following errors
[valsys_TIMESERIES_VALUE [37]] Error: SSIS Error Code DTS_E_OLEDBERROR. An OLE DB error has occurred. Error code: 0x80040E14.
An OLE DB record is available. Source: "OraOLEDB" Hresult: 0x80040E14 Description: "ORA-12801: error signaled in parallel query server P006
ORA-01821: date format not recognized
ORA-02063: preceding 2 lines from VALSYS".
An OLE DB record is available. Source: "OraOLEDB" Hresult: 0x80004005 Description: "ORA-12801: error signaled in parallel query server P006
ORA-01821: date format not recognized
ORA-02063: preceding 2 lines from VALSYS".
"SELECT TS_ID,DATE_UTC,VALUE,CASE WHEN VALUE = 0 THEN '0' ELSE TO_CHAR(VALUE) END VALUE_CONV ,SUBSTITUTE_VALUE
,CASE WHEN SUBSTITUTE_VALUE = 0 THEN '0' ELSE TO_CHAR(SUBSTITUTE_VALUE) END SUBSTITUTE_VALUE_CONV ,MANUAL_VALUE, CASE WHEN MANUAL_VALUE = 0 THEN '0' ELSE TO_CHAR(MANUAL_VALUE) END MANUAL_VALUE_CONV,FEASIBLE
,VERIFIED,APPROVED,VALID_FROM,VALID_UNTIL,LAST_EXPORT,DAY_CET,COMPUTED,MARKER,TASK_UNIT_ID FROM \""+ #[$Project::Oracle_Valsys_Schemaname] + "\".\"VALSYS_TIMESERIES_VALUE\"
WHERE VALID_FROM > to_timestamp('"+ #[User::PreLET] + "', '2020-06-30 0:00:00') "
The second parameter of to_timestamp is a format mask, e.g. 'yyyy-mm-dd hh24:mi:ss'
This is a sample usage
select
to_timestamp('2020-06-30 00:00:00', 'yyyy-mm-dd hh24:mi:ss') tst
from dual;
TST
-----------------------------
30.06.2020 00:00:00,000000000

Can varchar datatype be a timestamp in Confluent?

I'm using confluent to implement realtime ETL.
My datasource is oracle, every table has a column named ts ,it's data type is varchar, but data in this column is YYYY-MM--DD HH24:MI:SS format.
can I use this column as timestamp in confluent kafka connector ?
how to config the xxxxx.properties file?
mode=timestamp
query= select to_date(a.ts,'yyyy-mm-dd hh24:mi:ss') tsinc,a.* from TEST_CORP a
poll.interval.ms=1000
timestamp.column.name=tsinc
connector.class=io.confluent.connect.jdbc.JdbcSourceConnector
query=select * from NFSN.BD_CORP
mode=timestamp
poll.interval.ms=3000
timestamp.column.name=TS topic.prefix=t_ validate.non.null=false
then I get this error:
[2018-12-25 14:39:59,756] INFO After filtering the tables are:
(io.confluent.connect.jdbc.source.TableMonitorThread:175) [2018-12-25
14:40:01,383] DEBUG Checking for next block of results from
TimestampIncrementingTableQuerier{table=null, query='select * from
NFSN.BD_CORP', topicPrefix='t_', incrementingColumn='',
timestampColumns=[TS]}
(io.confluent.connect.jdbc.source.JdbcSourceTask:291) [2018-12-25
14:40:01,386] DEBUG TimestampIncrementingTableQuerier{table=null,
query='select * from NFSN.BD_CORP', topicPrefix='t_',
incrementingColumn='', timestampColumns=[TS]} prepared SQL query:
select * from NFSN.BD_CORP WHERE "TS" > ? AND "TS" < ? ORDER BY "TS"
ASC
(io.confluent.connect.jdbc.source.TimestampIncrementingTableQuerier:161)
[2018-12-25 14:40:01,386] DEBUG executing query select
CURRENT_TIMESTAMP from dual to get current time from database
(io.confluent.connect.jdbc.dialect.OracleDatabaseDialect:462)
[2018-12-25 14:40:01,388] DEBUG Executing prepared statement with
timestamp value = 1970-01-01 00:00:00.000 end time = 2018-12-25
06:40:43.828
(io.confluent.connect.jdbc.source.TimestampIncrementingCriteria:162)
[2018-12-25 14:40:01,389] ERROR Failed to run query for table
TimestampIncrementingTableQuerier{table=null, query='select * from
NFSN.BD_CORP', topicPrefix='t_', incrementingColumn='',
timestampColumns=[TS]}: {}
(io.confluent.connect.jdbc.source.JdbcSourceTask:314)
java.sql.SQLDataException: ORA-01843: not a valid month
at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:447)
at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:396)
at oracle.jdbc.driver.T4C8Oall.processError(T4C8Oall.java:951)
at oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:513)
at oracle.jdbc.driver.T4CTTIfun.doRPC(T4CTTIfun.java:227)
at oracle.jdbc.driver.T4C8Oall.doOALL(T4C8Oall.java:531)
at oracle.jdbc.driver.T4CPreparedStatement.doOall8(T4CPreparedStatement.java:208)
at oracle.jdbc.driver.T4CPreparedStatement.executeForDescribe(T4CPreparedStatement.java:886)
at oracle.jdbc.driver.OracleStatement.executeMaybeDescribe(OracleStatement.java:1175)
at oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement.java:1296)
at oracle.jdbc.driver.OraclePreparedStatement.executeInternal(OraclePreparedStatement.java:3613)
at oracle.jdbc.driver.OraclePreparedStatement.executeQuery(OraclePreparedStatement.java:3657)
at oracle.jdbc.driver.OraclePreparedStatementWrapper.executeQuery(OraclePreparedStatementWrapper.java:1495)
at io.confluent.connect.jdbc.source.TimestampIncrementingTableQuerier.executeQuery(TimestampIncrementingTableQuerier.java:168)
at io.confluent.connect.jdbc.source.TableQuerier.maybeStartQuery(TableQuerier.java:88)
at io.confluent.connect.jdbc.source.TimestampIncrementingTableQuerier.maybeStartQuery(TimestampIncrementingTableQuerier.java:60)
at io.confluent.connect.jdbc.source.JdbcSourceTask.poll(JdbcSourceTask.java:292)
at org.apache.kafka.connect.runtime.WorkerSourceTask.poll(WorkerSourceTask.java:244)
at org.apache.kafka.connect.runtime.WorkerSourceTask.execute(WorkerSourceTask.java:220)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:175)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:219)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748) [2018-12-25 14:40:01,390] DEBUG Resetting querier
TimestampIncrementingTableQuerier{table=null, query='select * from
NFSN.BD_CORP', topicPrefix='t_', incrementingColumn='',
timestampColumns=[TS]}
(io.confluent.connect.jdbc.source.JdbcSourceTask:332) ^C[2018-12-25
14:40:03,826] INFO Kafka Connect stopping
(org.apache.kafka.connect.runtime.Connect:65) [2018-12-25
14:40:03,827] INFO Stopping REST server
(org.apache.kafka.connect.runtime.rest.RestServer:223)

Error in getting data from Oracle to hive using sqoop

I am running the following sqoop query:
sqoop import --connect jdbc:oracle:thin:#ldap://oid:389/ewsop000,cn=OracleContext,dc=****,dc=com \
--table ngprod.ewt_payment_ng --where "d_last_updt_ts >= to_timestamp('11/01/2013 11:59:59.999999 PM', 'MM/DD/YYYY HH:MI:SS.FF6 AM')" \
AND "d_last_updt_ts <= to_timestamp('11/10/2013 11:59:59.999999 PM', 'MM/DD/YYYY HH:MI:SS.FF6 AM')" --username ***** --P \
--columns N_PYMNT_ID,D_last_updt_Ts,c_pymnt_meth,c_rcd_del,d_Create_ts \
--hive-import --hive-table payment_sample_table2
The schema for table payment_sample_table2 is in hive. it is running fine if I do not use
AND "d_last_updt_ts <= to_timestamp('11/10/2013 11:59:59.999999 PM', 'MM/DD/YYYY HH:MI:SS.FF6 AM')"
Can someone tell me why, or if there's any other way to get the range of data?
Please specify the exact error . In any case please put the "AND .." within the same double quotation and on the same previous line as the preceding part of the "where" clause. As shown above you have a badly formatted commandline - nothing to do with the actual query.

JPA 2 Criteria Day from date

I have a field in a table and that stores a date. I'd like to select all records that which the date is on day 5.
After a lot of research I get the code below:
Predicate dayValue = cb.equal(cb.function("day", Integer.class, test.<Date>get(Test_.dateInit)), now.get(Calendar.DAY_OF_MONTH) );
But, I'm using a Oracle database and it doesn't have the function day:
[EL Warning]: 2013-01-14 11:51:08.001--UnitOfWork(23011228)--Thread(Thread[main,5,main])--Exception [EclipseLink-4002] (Eclipse Persistence Services - 2.1.2.v20101206-r8635): org.eclipse.persistence.exceptions.DatabaseException
Internal Exception: java.sql.SQLException: ORA-00904: "DAY": invalid identifier
Is there an other way to do this select?
Thanks
I got a way.
Using the function to_char of oracle in the function method:
Predicate dayValue = cb.equal(cb.function("to_char", Integer.class, test.<Date>get(Test_.dateInit),cb.parameter(String.class, "dayFormat")), now.get(Calendar.DAY_OF_MONTH) );
...
q.setParameter("dayFormat", "DD");
In Oracle you can use
select to_char (date '2013-01-14', 'D') d from dual;
to give you the day number, the above will output 1 for Monday
if you want to see Monday just change to
select to_char (date '2013-01-14', 'Day') d from dual;
the above will output Monday
hope that helps.

Oracle current_timestamp to seconds conversion

We are using Oracle database.
In our table timestamp is stored as seconds since 1970, how can I convert the time stamp obtained through current_timestamp() function to seconds
This would do it:
select round((cast(current_timestamp as date) - date '1970-01-01')*24*60*60) from dual
Though I wouldn't use current_timestamp if I was only interested in seconds, I would use SYSDATE:
select round((SYSDATE - date '1970-01-01')*24*60*60) from dual
Maybe not completely relevant. I had to resolve other way around problem (e.g. Oracle stores timestamp in V$RMAN_STATUS and V$RMAN_OUTPUT) and I had to convert that to date/timestamp. I was surprised, but the magic date is not 1970-01-01 there, but 1987-07-07. I looked at Oracle's history and the closest date I can think of is when they ported Oracle products to UNIX. Is this right?
Here's my SQL
SELECT /*+ rule */
to_char(min(stamp)/(24*60*60) + date '1987-07-07', 'DD-MON-YYYY HH24:MI:SS') start_tm
, to_char(to_char(max(stamp)/(24*60*60) + date '1987-07-07', 'DD-MON HH24:MI:SS')) end_tm
FROM V$RMAN_STATUS
START WITH (RECID, STAMP) =
(SELECT MAX(session_recid),MAX(session_stamp) FROM V$RMAN_OUTPUT)
CONNECT BY PRIOR RECID = parent_recid ;
I needed to send timestamp to GrayLog via GELF from Oracle DB. I tried different versions and solutions but only one worked correctly.
SQL:
SELECT REPLACE((CAST(dat AS DATE) - TO_DATE('19700101', 'YYYYMMDD')) * 86400 + MOD(EXTRACT(SECOND FROM dat), 1), ',', '.') AS millis
FROM (SELECT SYSTIMESTAMP AT TIME ZONE 'GMT' AS dat FROM dual)
The result for Systmiestamp
2018/12/18 19:47:29,080988 +02:00
will be
1545155249.080988

Resources