sqoop syntax to import to kudu table - sqoop

We'd like to test Kudu and need to import data. Sqoop seems like the correct choice. I find references that you can import to Kudu but no specifics. Is there any way to import to Kudu using Sqoop?

Not at this time. See:
https://issues.apache.org/jira/browse/SQOOP-2903 - Add Kudu connector for Sqoop

Related

How to Sqoop Import as JSON?

I am aware Sqoop supports importing data as Avro, as Parquet, as Text etc. Is there a way to import data as JSON?
Using Spark is not an option for me at the moment.
Sqoop does not support importing JSON files. You can import it as textfiles into HDFS and parse it using Python/scala.

Importing data to hbase using sqoop

When I want to import the data to hive using sqoop I can specify --hive-home <dir> and sqoop will call that specified copy of hive installed on the machine where the script is being executed. But what about hbase? How does sqoop know which hbase instance/database I want the data to be imported on?
Maybe the documentation helps?
By specifying --hbase-table, you instruct Sqoop to import to a table in HBase rather than a directory in HDFS
Every example I see just shows that option along with column families, and whatnot, so I assume it depends on whatever variables that might be part of the sqoop-env.sh, like what the Hortonworks docs say
When you give the hive home directory, that's not telling it any database or table information either, but rather where Hive configuration files exist on the machine you're running Sqoop on. By default, that's set to be the environment variable $HIVE_HOME

How do I import from a MySQL database to Datastax DSE Hive using sqoop?

I've spent the afternoon trying to wrap my head around how I can leverage dse sqoop to import a table from MySQL to Hive/Shark. In my case, I am not really interested in importing the table into Cassandra per sé. Hive/Shark will do.
AFAIK, this should be possible given the dse sqoop import help gives me options to create a Hive table. Ive been trying to execute something very similar to http://www.datastax.com/documentation/datastax_enterprise/4.5/datastax_enterprise/ana/anaSqpImport.html except I can't seem to be able to get the Cassandra username/password credentials to work.
Should this be possible? How? Do I have to go through a CQL table?
I am running DSE 4.5.
Sounds like you're trying to do something similar to slide 47 in this deck:
http://www.slideshare.net/planetcassandra/escape-from-hadoop
The strategy Russell uses there is to use the spark mysql driver, no need to deal with Sqoop. You do have to add the dependency to your spark classpath for it to work. No need to go through a CQL table.
Then you can join with c* data, write the data to c*, etc.

Direct import from Oracle to Hadoop using Sqoop

I want to use --direct parameter when I import the data from the Oracle. Is it possible to use data dump/pump utility using --direct option? Do I need to install any Oracle utility on my shell? If yes, please suggest what do I need to install?
Dharmesh
Unfortunately, there's no Sqoop connector that uses the DataPump utility.
Oracle does have their own (closed source) big data connectors. I believe SQL Loader for Hadoop uses datapump format.
Oracle Big Data Connector (Loader) is used to import data from Hadoop to Oracle. But, not from Oracle to Hadoop.

Cant Query Hive Tables after Sqoop import

I imported several tables of a oracle db into hive via sqoop. The command looked something like this:
./sqoop import --connect jdbc:oracle:thin:#//185.2.252.52:1521/orcl --username USER_NAME --password test --table TABLENAME--hive-import
Im using a embedded Metastore (At least i think so. i have not changed the default conf in that regard). When i do SHOW TABLES in HIVE, the imported Tables do not show up, but some Tables i've created for testing via the command line do. The tables are all in the same warehouse directory on the hdfs. It seems like the sqoop import is not using the same metastore.
But where ist it? And how can I switch to it when using the command line for querying?
thanks
I think that the entire problem is in embedded metastore as HIVE will create it in case that it don't exists in user current working directory by default. And thus Sqoop will end up using different metastore than hive. I would recommend configuring MySQL as backend for metastore.

Resources