Terdata export fails when using TDCH - hadoop

when exporting 2billion+ records into teradata from hadoop using TDCH (Teradata Connector for Hadoop) using the below command with "batch.insert",
hadoop jar teradata-connector-1.3.2-hadoop210.jar com.teradata.connector.common.tool.ConnectorExportTool \
-D mapreduce.job.queuename=<queuename> \
-libjars <LIB_JARS_PATH> \
-classname com.teradata.jdbc.TeraDriver \
-url <jdbc_connection_string> \
-username <user_id> \
-password "********" \
-jobtype hive \
-sourcedatabase <hive_src_dbase> \
-sourcetable <hive_src_table> \
-fileformat orcfile \
-stagedatabase <stg_db_in_tdata> \
-stagetablename <stg_tbl_in_tdata> \
-targettable <target_tbl_in_tdata> \
-nummappers 25 \
-batchsize 13000 \
-method batch.insert \
-usexviews false \
-keepstagetable true \
-queryband '<queryband>'
Data is loading successfully into stage table but, then the export job fails before inserting the records in stage table into target table saying, "Connection Reset"
Can someone please help me identify the reason for this, and how to fix this.

Related

Exporting data from Teradata to HDFS using TDCH

I'm trying to export a table from Teradata into a file in my hdfs using TDCH.
I'm using the below parameters :
hadoop jar $TDCH_JAR com.teradata.connector.common.tool.ConnectorImportTool \
-libjars $LIB_JARS \
-Dmapred.job.queue.name=default \
-Dtez.queue.name=default \
-Dmapred.job.name=TDCH \
-classname com.teradata.jdbc.TeraDriver \
-url jdbc:teradata://$ipServer/logmech=ldap,database=$database,charset=UTF16 \
-jobtype hdfs \
-fileformat textfile \
-separator ',' \
-enclosedby '"' \
-targettable ${targetTable} \
-username ${userName} \
-password ${password} \
-sourcequery "select * from ${database}.${targetTable}" \
-nummappers 1 \
-sourcefieldnames "" \
-targetpaths ${targetPaths}
It's working, but I need the headers in the file, and when I add the parameter:
-targetfieldnames "ID","JOB","DESC","DT","REG" \
It doesnt work, I don't even generate the file anymore.
Can anyonne help me?
The -targetfieldnames option is only valid for -jobtype hive.
It does not put headers in the HDFS file, it specifies Hive column names.
(There is no option to prefix CSV with a header record.)
Also the value supplied for -targetfieldnames should be a single string like "ID,JOB,DESC,DT,REG" rather than a list of strings.

Unclear errorreport from RRDTool graphing script

When one version of a set of scripts runs, which apply RRDTool, you try more of the same .....
Made a version of the lua-script, which now collects power/energy-info, and the related file create_pipower1A_graph.sh is a direct derivative of the errorfree running sh-file described in RRDTool, How to get png-files by means of os-execute-call from lua-script?
The derivative sh-file should produce a graph with the output of 3 inverters and the parallel consumption.
That sh-file for graphic output is below.
#!/bin/bash
rrdtool graph /home/pi/pipower1.png \
DEF:Pwr_MAC=/home/pi/pipower1.rrd:Power0430:AVERAGE \
DEF:Pwr_SAJ=/home/pi/pipower1.rrd:Power1530:AVERAGE \
DEF:Pwr_STECA=/home/pi/pipower1.rrd:Power2950:AVERAGE \
DEF:Pwr_Cons=/home/pi/pipower1.rrd:Power_Cons:AVERAGE \
LINE1:Pwr_MAC#ff0000:Output Involar \
LINE1:Pwr_SAJ#0000ff:Output SAJ1.5 \
LINE1:Pwr_STECA#5fd00b:Output STECA \
LINE1:Pwr_Cons#00ffff:Consumption \
COMMENT:"\t\t\t\t\t\t\l" \
COMMENT:"\t\t\t\t\t\t\l" \
GPRINT:Pwr_MAC:LAST:"Output_Involar Latest\: %2.1lf" \
GPRINT:Pwr_MAC:MAX:" Max.\: %2.1lf" \
GPRINT:Pwr_MAC:MIN:" Min.\: %2.1lf" \
COMMENT:"\t\t\t\t\t\t\l" \
GPRINT:Pwr_SAJ:LAST:"Output SAJ1.5k Latest\: %2.1lf" \
GPRINT:Pwr_SAJ:MAX:" Max.\: %2.1lf" \
GPRINT:Pwr_SAJ:MIN:" Min.\: %2.1lf" \
COMMENT:"\t\t\t\t\t\t\l" \
GPRINT:Pwr_STECA:LAST:"Output STECA Latest\: %2.1lf" \
GPRINT:Pwr_STECA:MAX:" Max.\: %2.1lf" \
GPRINT:Pwr_STECA:MIN:" Min.\: %2.1lf" \
COMMENT:"\t\t\t\t\t\t\l" \
GPRINT:Pwr_Cons:LAST:"Consumption Latest\: %2.1lf" \
GPRINT:Pwr_Cons:MAX:" Max.\: %2.1lf" \
GPRINT:Pwr_Cons:MIN:" Min.\: %2.1lf" \
COMMENT:"\t\t\t\t\t\t\l" \
--width 700 --height 400 \
--title="Graph B: Power Production & Consumption for last 24 hour" \
--vertical-label="Power(W)" \
--watermark "`date`"
The lua-script again runs without errors and as result the rrd-file is periodically updated, the graphic output is generated,but no graph appears! Tested on 2 different Raspberries, but no difference in reactions.
Running the sh-file create_pipower1A_graph from the commandline produces the following errors.
pi#raspberrypi:~$ sudo /home/pi/create_pipower1A_graph.sh
ERROR: 'I' is not a valid function name
pi#raspberrypi:~$ ./create_pipower1A_graph.sh
ERROR: 'I' is not a valid function name
Question: Puzzled, because nowhere in the sh-file an I is applied as function command. Explanation? Hint for remedy of this error?
Your problem is here:
LINE1:Pwr_MAC#ff0000:Output Involar \
LINE1:Pwr_SAJ#0000ff:Output SAJ1.5 \
LINE1:Pwr_STECA#5fd00b:Output STECA \
LINE1:Pwr_Cons#00ffff:Consumption \
These lines need to be quoted as they contain spaces and hash symbols.
LINE1:"Pwr_MAC#ff0000:Output Involar" \
LINE1:"Pwr_SAJ#0000ff:Output SAJ1.5" \
LINE1:"Pwr_STECA#5fd00b:Output STECA" \
LINE1:"Pwr_Cons#00ffff:Consumption" \

Sqoop export having duplicate entries in to table without primary key

I have a table department_id,department_name,LastModifieddate;
when i run the command like below
sqoop export \
--connect "jdbc:mysql://ip-172-31-13-154:3306/sqoopex" \
--username sqoopuser \
--password NHkkP876rp \
--table dep_prasad \
--input-fields-terminated-by '|' \
--input-lines-terminated-by '\n' \
--export-dir /user/venkateswarlujvs2821/dep_prasad/ \
--num-mappers 2 \
--outdir /user/venkateswarlujvs2821/dep_prasad
It works fine and is inserting records
when i modify the file which is there in HDFS and adding some more records
and when i try to export it .It is inserting duplicate entries in to my sql
i am using the following sqoop command for the second time.
sqoop export \
--connect "jdbc:mysql://ip-172-31-13-154:3306/sqoopex" \
--username sqoopuser \
--password NHkkP876rp \
--table dep_prasad \
--input-fields-terminated-by '|' \
--input-lines-terminated-by '\n' \
--update-key department_id \
--update-mode allowinsert \
--export-dir /user/venkateswarlujvs2821/dep_prasad/ \
--num-mappers 2 \
--outdir /user/venkateswarlujvs2821/dep_prasad
Note:My table DO NOT HAVE PRIMARY KEY
I want to update only the new records how can i do that?

sqoop hive import with partitions

I have some sqoop jobs importing into hive that I want to partition, but I can't get it to function. The import will actually work: the table is sqooped, it's visible in hive, there's data but the partition parameters I'm expecting to see don't appear when I describe the table. I HAVE sqooped this table as a csv, created an external parquet table, and inserted the data into that (which works), but I want to be able to avoid the extra steps if possible. here's my current code. Am I missing something or am I trying to do the impossible? thanks!
sqoop import -Doraoop.import.hint=" " \
--options-file /home/[user]/pass.txt \
--verbose \
--connect jdbc:oracle:thin:#ldap://oid:389/cn=OracleContext,dc=[employer],dc=com/SQSOP051 \
--username [user]\
--num-mappers 10 \
--hive-import \
--query "select DISC_PROF_SK_ID, CLM_RT_DISC_IND, EASY_PAY_PLN_DISC_IND, TO_CHAR(L40_ATOMIC_TS,'YYYY') as YEAR, TO_CHAR(L40_ATOMIC_TS,'MM') as MONTH from ${DataSource[index]}.$TableName where \$CONDITIONS" \
--hive-database [dru_user] \
--hcatalog-partition-keys YEAR \
--hcatalog-partition-values '2015' \
--target-dir hdfs://nameservice1/data/res/warehouse/finance/[dru_user]/Claims_Data/$TableName \
--hive-table $TableName'testing' \
--split-by ${SplitBy[index]} \
--delete-target-dir \
--direct \
--null-string '\\N' \
--null-non-string '\\N' \
--as-parquetfile \
You can replace the options-file by --password-file. However that will not solve the partition problem. For the partition problem you can try creating the partition-ed table $TableName partitioned first before the import.
sqoop import -Doraoop.import.hint=" " \
--password-file /home/[user]/pass.txt \
--verbose \
--connect jdbc:oracle:thin:#ldap://oid:389/cn=OracleContext,dc=[employer],dc=com/SQSOP051 \
--username [user] \
--num-mappers 10 \
--hive-import \
--query "SELECT disc_prof_sk_id,
clm_rt_disc_ind,
easy_pay_pln_disc_ind,
To_char(l40_atomic_ts,'YYYY') AS year,
To_char(l40_atomic_ts,'MM') AS month
FROM ${DataSource[index]}.$tablename
WHERE \$conditions" \
--hcatalog-database [dru_user] \
--hcatalog-partition-key YEAR \
--hcatalog-partition-values '2015' \
--target-dir hdfs://nameservice1/data/res/warehouse/finance/[dru_user]/Claims_Data/$TableName \
--hcatalog-table $TableName \
--split-by ${SplitBy[index]} \
--delete-target-dir \
--direct \
--null-string '\\N' \
--null-non-string '\\N' \
--as-parquetfile

Run a sqoop job on a specific queue

I'm trying to create a Sqoop job run in a specific queue but it doesn't work.
I've tried two things :
1st : Declare the queue in the job creation
sqoop job \
--create myjob \
-- import \
--connect jdbc:teradata://RCT/DATABASE=MYDB \
-Dmapred.job.queue.name=shortduration \
--driver com.teradata.jdbc.TeraDriver \
--username DBUSER -P \
--query "$query" \
--target-dir /data/source/dest/$i \
--check-column DAT_CRN_AGG \
--incremental append \
--last-value 2001-01-01 \
--split-by NUM_CTR
But it throws a parsing argument error due to -Dmapred.job.queue.name=shortduration
2nd : remove the -Dmapred.job.queue.name=shortduration of the job creation. job creation works well. But unable to specify which queue should be used
I'm loosing hope to run my job in this queue
Thanks for any help provided !
EDIT : get an import working with sqoop import -Dmapred.job.queue.name=shortduration but sqoop job not working
I think you have an error in your command
-Dmapreduce.job.queuename=NameOfTheQueue
note queuename one word and the order, based on the documentation, vm args need to go directly after the import.
https://sqoop.apache.org/docs/1.4.3/SqoopUserGuide.html#_using_generic_and_specific_arguments
Generic Hadoop command-line arguments:
(must preceed any tool-specific arguments)
Generic options supported are
-conf specify an application configuration file
-D use value for given property
sqoop job -Dmapred.job.queuename=shortduration \
--create myjob \
-- import \
--connect jdbc:teradata://RCT/DATABASE=MYDB \
--driver com.teradata.jdbc.TeraDriver \
--username DBUSER -P \
--query "$query" \
--target-dir /data/source/dest/$i \
--check-column DAT_CRN_AGG \
--incremental append \
--last-value 2001-01-01 \
--split-by NUM_CTR
you might just want to try it with the import tool to see if it is working correctly then do the job command ie
sqoop import -Dmapred.job.queuename=shortduration \
--connect jdbc:teradata://RCT/DATABASE=MYDB \
--driver com.teradata.jdbc.TeraDriver \
--username DBUSER -P \
--query "$query" \
--target-dir /data/source/dest/$i \
--check-column DAT_CRN_AGG \
--incremental append \
--last-value 2001-01-01 \
--split-by NUM_CTR

Resources