FastExport script for loading data into Hadoop? - hadoop

I am new to the FastExport script.
Can anyone give me an example FastExport script for loading data from Teradata into Hadoop.
I need the entire script.
Thanks in advance.

AFAIK you can't FastExport directly from Teradata to Hadoop.
But there's a new Teradata Conector for Hadoop which supports both import and export.
http://developer.teradata.com/connectivity/articles/teradata-connector-for-hadoop-now-available
The tutorial shows sample scripts to export to Hadoop:
https://developer.teradata.com/sites/all/files/Teradata%20Connector%20for%20Hadoop%20Tutorial%20v1%200%20final.pdf

Related

how to export data to mainframe from hadoop

Is there a way to export data from hadoop to mainframe using sqoop. I am pretty new to mainframe.
I understand that we can sqoop in the data from mainframe to hadoop. I skimmed through the sqoop documentation but doesnt say anything about export
appreciate your help.
This appears to cover export: https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html#_literal_sqoop_export_literal
While I've not used sqoop, it appears to use a JDBC connection to a mainframe database. If you have that and the mainframe data table is already created (note in the doc: "The target table must already exist in the database."), then you should be able to connect to the mainframe database as the export destination. Many mainframe data sources (e.g. Db2 z/OS) support this.

Options for automating data off loading from teradata to hadoop

I am aware that Sqoop can be used to pull the data from sources such as teradata .But could you guys give any suggestion how to do data loading from teradata to hadoop with less man hours when there are hundreds of tables present.
Thanks in advance... :)

Load data into Greenplum DB using MapReduce or Sqoop

I want to try loading the data into Greenplum using mapreduce or sqoop. For now, the ways to load greenplum db from hdfs is, creating an extenrnal table with gphdfs and then loading internal table. I want to tryout solution to directly load the data into greenplum with sqoop or mapreduce. I need some inputs on how i can proceed on this. Could you please help me out.?
With regards to Sqoop, Sqoop export will help to achieve this.
http://www.tutorialspoint.com/sqoop/sqoop_export.htm
While not sqoop, I am currently in the experimental phases of using Greenplum's external tables to load from hdfs. So far it seems to perform.

Spark & HCatalog?

I feel comfortable with loading HCatalog using Pig and was wondering if it's possible to use Spark instead of Pig. Unfortunately, I'm quite new with Spark...
Can you provide any materials on how to start? Are there any Spark libraries to use?
Any Examples? I've made all exercises on http://spark.apache.org/ but they are focusing on RDD and don't go any further..
I will be grateful for any help...
Regards
Pawel
You can use spark SQL to read from Hive Table instead of HCatalog.
https://spark.apache.org/sql/
You can apply same transformations like Pig using Spark Java/Scala/Python language like filter, join, group by..
You can reference the following link for using HCatalog InputFormat wrapper with Spark; which was written prior to SparkSQL.
https://gist.github.com/granturing/7201912
Our systems have loaded both and we can use either. Spark takes on traits of the language you are using, Scala, Python...,. For example using Spark with Python you can utilize many of the libraries of Python within Spark.

S3 to local hdfs data transfer using Sqoop

I have installed Apache hadoop on my local system and want to import data from Amazon s3 using Sqoop.
Is there any way to achive this.
If yes kindly help me how can i achieve this.
Examples would be much appreciated.
Please help me as soon as possible.
Note:I am not using Amazon EMR.
Sqoop is for getting data from relational databases only at the moment.
Try using "distcp" for getting data from S3.
The usage is documented here: http://wiki.apache.org/hadoop/AmazonS3 In the section "Running bulk copies in and out of S3"

Resources