How to get the HFile from sqoop after bulk hbase import? - hadoop

I use sqoop to do bulk hbase import. I use this option from sqoop: --hbase-bulkload. Sqoop will generate HFiles and import the hfiles to my hbase. I can verify the data is there and from sqoop log, it try to load hfile from
INFO mapreduce.LoadIncrementalHFiles: Trying to load hfile=hdfs://sandbox.hortonworks.com:8020/tmp/sqoop/data/u/2ce542f59b56466d988e49f7a7e512b7 first=\x00\x00\x00\x00\x00\x01\xDE1\xF8 last=\x00\x00\x00\x00\x00\x01\xEB:L
However, after the job is done. I try to see the files and it is not there anymore. I am using this hadoop command to view files:
hadoop fs -ls /tmp/sqoop/data
Is the HFile stored somewhere else? Or there is an option to keep it after import job?
Thanks

I have done importing data into hbase from oracle using sqoop itself.After that importing process has completed,the file is stored in hdfs file system
/home/USERNAME/FILENAME(TABLENAME)
I think your Hfile also be stored with the same concept so better once check it there

Related

How to import images to HDFS? I am using CDH version 5.11

I have sqoop, flume and spark installed on my system but I am not sure how to import image files.
I am able to import data from RDBMS using sqoop, successfully and I am able to import text files using flume.
How do I import images on hdfs?
Hadoop doesn't have a concept of file type (like Windows does, for example) so you can use any tool to import images into Hadoop.
If you have images in a BLOB column, you would use SQOOP.
Flume supports binary data so you could use BlobDeserializer.
BlobDeserializer
This deserializer reads a Binary Large Object (BLOB) per event, typically one BLOB per file. For example a PDF or JPG file. Note that this approach is not suitable for very large objects because the entire BLOB is buffered in RAM.
In HDFS, the basic commands -put or -copyFromLocal will work.
$ hdfs dfs -put about.png /tmp
$ hdfs dfs -ls /tmp/about.png
-rw-r--r-- 3 testuser supergroup 53669 2017-06-30 11:34 /tmp/about.png
$
Or you can use the WebHDFS APIs to do this remotely.
References:
Import BLOB (Image) from oracle to hive
https://flume.apache.org/FlumeUserGuide.html#blobdeserializer
https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/WebHDFS.html#Create_and_Write_to_a_File

Sqoop Import from Hive to Hive

Can we import tables from Hive DataSource to Hive DataSource using Sqoop.
Query like -
sqoop import --connect jdbc:hive2://localhost:10000/default --driver org.apache.hive.jdbc.HiveDriver --username root --password root --table student1 -m 1 --target-dir hdfs://localhost:9000/user/dummy/hive2result
Right now its throwing the below exception
15/07/19 19:50:18 ERROR manager.SqlManager: Error reading from database: java.sql.SQLException: Method not supported
java.sql.SQLException: Method not supported
at org.apache.hive.jdbc.HiveResultSetMetaData.isSigned(HiveResultSetMetaData.java:141)
at org.apache.sqoop.manager.SqlManager.getColumnInfoForRawQuery(SqlManager.java:290)
at org.apache.sqoop.manager.SqlManager.getColumnTypesForRawQuery(SqlManager.java:240)
at org.apache.sqoop.manager.SqlManager.getColumnTypes(SqlManager.java:226)
at org.apache.sqoop.manager.ConnManager.getColumnTypes(ConnManager.java:295)
at org.apache.sqoop.orm.ClassWriter.getColumnTypes(ClassWriter.java:1773)
at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1578)
at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:96)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:478)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:601)
at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
Sqoop is not a tool for transferring data from one hive instance to another hive instance. Seems like your requirement is to transfer data in hive from one cluster to another cluster. This can be achieved using hadoop distcp. The full form of sqoop itself is SQl to hadOOP and viceversa.
If you want to migrate multiple databases and tables from one hive to another hive instance, the best approach is to transfer the data using hadoop distcp and trigger the DDLs in the 2nd hive instance. If you don't have the DDLs handy with you, no need to worry.
Just take a dump of the metastore database.
Open the dump file using a notepad or textpad
Replace the hdfs uri with the new hdfs uri.
Import the mysql dump to the metastore of the 2nd hive instance.
Refresh the tables.
An example is given in the below blog post
https://amalgjose.wordpress.com/2013/10/11/migrating-hive-from-one-hadoop-cluster-to-another-cluster-2/
distcp will work only for external tables. For managed tables (transactional) use export import DDL.

Sqoop import is showing error - Jobtracker is not yet running

I am trying to do sqoop import. It shows error Jobtracker is not running.
But I tried with eval by selecting few rows it works.
But while doing import I am getting error. I have included snapshot of both eval and import function which I have tried.
I tried function hadoop command (hadoop fs -ls, -put) is working.
I started start-all.sh.
Afterwards I check with jps then all daemon run.
After few minutes, all daemon stop.
SQOOP Eval function just acts on the RDBMS database and returns you the resultset. Here hadoop does not come into picture.
SQOOP Import function tries to import the data from RDBMS and load it into HDFS. The namenode is not able to connect to the HDFS. Restart hadoop and check the job tracker and namenode logs. Check whether the namenode and datanode storage directories mentioned in the the HDFS-Site.xml are available else point to a new directory.

Can we export data from hadoop to csv using sqoop

I try to read spec from sqoop and there is no solution right now about how to export to csv format using sqoop?
is it possible if we want ti export data from hadoop to csv using sqoop?
is there any solution?
You don't need sqoop to copy data from Hadoop to local filesystem using Sqoop. Sqoop strictly works with importing/exporting data to/from RDBMS (using JDBC).
Rather, you could just copy the data from Hadoop to local filesystem using hadoop command line tool hadoop fs -get [hadoop_src] [local_dest].

Export data into Hive from a node without Hadoop(HDFS) installed

Is it possible to export data from a node that has not hadoop(HDFS) or Sqoop installed to a Hive server?
I would read the data from a source which could be Mysql or just files in some directory and then use the Hadoop core classes or something like Sqoop to export the data into my Hadoop cluster.
I am programming in Java.
Since you are final destination is a hive table. I would suggest the following :
Create a hive final table.
use the following command to load data from the other node
LOAD DATA LOCAL INPATH '<full local path>/kv1.txt' OVERWRITE INTO TABLE table_name;
refer this
Using Java , You could use JSCH lib to invoke these shell commands or so .
Hope this helps.

Resources