I want to use --direct parameter when I import the data from the Oracle. Is it possible to use data dump/pump utility using --direct option? Do I need to install any Oracle utility on my shell? If yes, please suggest what do I need to install?
Dharmesh
Unfortunately, there's no Sqoop connector that uses the DataPump utility.
Oracle does have their own (closed source) big data connectors. I believe SQL Loader for Hadoop uses datapump format.
Oracle Big Data Connector (Loader) is used to import data from Hadoop to Oracle. But, not from Oracle to Hadoop.
Related
Is there a way to export data from hadoop to mainframe using sqoop. I am pretty new to mainframe.
I understand that we can sqoop in the data from mainframe to hadoop. I skimmed through the sqoop documentation but doesnt say anything about export
appreciate your help.
This appears to cover export: https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html#_literal_sqoop_export_literal
While I've not used sqoop, it appears to use a JDBC connection to a mainframe database. If you have that and the mainframe data table is already created (note in the doc: "The target table must already exist in the database."), then you should be able to connect to the mainframe database as the export destination. Many mainframe data sources (e.g. Db2 z/OS) support this.
I have used Sqoop to ingest data from Oracle to Hadoop and it worked well. It took only 4 mins to bring 86 million records from Oracle to Hive table without using partitions on Sqoop. Can anyone give some details about Oracle Hadoop connectors, Will it perform better than Sqoop?
Most of connectors would have the performance close to same as you'll have have a set of MapReduce jobs on the very end of your workflow and this would play the main role in your overall performance.
Oracle provides a set of different connectors for accessing the Hive and you could check a nice overview about standard solutions but I doubt that on the very end you will expect significant performance differences other then you see in Sqoop:
https://docs.oracle.com/cd/E37231_01/doc.20/e36961/start.htm#BDCUG119
Sqoop is a generic tool for working with the relational databases from Hadoop realm, and it is not limited by Oracle only. Besides it has an integration with other Hadoop solutions like Oozie for making complicated workflows, which makes it a good candidate over other types of connectors.
Personally myself I prefer Sqoop for Hadoop-driven import-export operations and connector approach for querying the data in Hadoop.
Sqoop will leverage a standard JDBC connection. Oracles connector will work with a fastloader/fastexport class integrated into the sqoop connection. It should be faster that Sqoop.
I want to export data from hortonworks hive to Cassandra
Is there a way to export data from Horton works Hive to datastax Cassandra without using ETL tools?
You use Sqoop for this.
Apache Sqoop
Apache Sqoop(TM) is a tool designed for efficiently transferring bulk
data between Apache Hadoop and structured datastores such as
relational databases.
Sqoop successfully graduated from the Incubator in March of 2012 and
is now a Top-Level Apache project.
interwebs link
Using Apache Spark with the Spark-Cassandra connector and saveToCassandra is another choice and one I see recommended more these days over Sqoop. You can use Spark as a basic load tool, or you can use it to also perform ETL transformations on your data.
I've spent the afternoon trying to wrap my head around how I can leverage dse sqoop to import a table from MySQL to Hive/Shark. In my case, I am not really interested in importing the table into Cassandra per sé. Hive/Shark will do.
AFAIK, this should be possible given the dse sqoop import help gives me options to create a Hive table. Ive been trying to execute something very similar to http://www.datastax.com/documentation/datastax_enterprise/4.5/datastax_enterprise/ana/anaSqpImport.html except I can't seem to be able to get the Cassandra username/password credentials to work.
Should this be possible? How? Do I have to go through a CQL table?
I am running DSE 4.5.
Sounds like you're trying to do something similar to slide 47 in this deck:
http://www.slideshare.net/planetcassandra/escape-from-hadoop
The strategy Russell uses there is to use the spark mysql driver, no need to deal with Sqoop. You do have to add the dependency to your spark classpath for it to work. No need to go through a CQL table.
Then you can join with c* data, write the data to c*, etc.
I want to do analysis on data which is in database(MS SQL Server). So how can I Bring that data on HDFS with the help of Sqoop/Hive? Is it possible with Hive/Sqoop?
Please suggest me how can we do it.
Thanks.
Microsoft recently released a SQL connector for sqoop. There are few ETL tools (open source and not) that also connect from SQL to Hadoop (like Talend etc.)