I try to read spec from sqoop and there is no solution right now about how to export to csv format using sqoop?
is it possible if we want ti export data from hadoop to csv using sqoop?
is there any solution?
You don't need sqoop to copy data from Hadoop to local filesystem using Sqoop. Sqoop strictly works with importing/exporting data to/from RDBMS (using JDBC).
Rather, you could just copy the data from Hadoop to local filesystem using hadoop command line tool hadoop fs -get [hadoop_src] [local_dest].
Related
Using sqoop version 1.4.5 and hadoop version 3.3.4. My requirement is to connect to remote hive and remote hadoop file system without changing the configuration files with kerberos.
Is it possible to do the following operation without amending the configuration files for hadoop, sqoop? If yes, then what all parameters needs to be changed in the configuration files?
I have sqoop, flume and spark installed on my system but I am not sure how to import image files.
I am able to import data from RDBMS using sqoop, successfully and I am able to import text files using flume.
How do I import images on hdfs?
Hadoop doesn't have a concept of file type (like Windows does, for example) so you can use any tool to import images into Hadoop.
If you have images in a BLOB column, you would use SQOOP.
Flume supports binary data so you could use BlobDeserializer.
BlobDeserializer
This deserializer reads a Binary Large Object (BLOB) per event, typically one BLOB per file. For example a PDF or JPG file. Note that this approach is not suitable for very large objects because the entire BLOB is buffered in RAM.
In HDFS, the basic commands -put or -copyFromLocal will work.
$ hdfs dfs -put about.png /tmp
$ hdfs dfs -ls /tmp/about.png
-rw-r--r-- 3 testuser supergroup 53669 2017-06-30 11:34 /tmp/about.png
$
Or you can use the WebHDFS APIs to do this remotely.
References:
Import BLOB (Image) from oracle to hive
https://flume.apache.org/FlumeUserGuide.html#blobdeserializer
https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/WebHDFS.html#Create_and_Write_to_a_File
Is there anyway to have Sqoop connected to different Hadoop clusters so that multiple Sqoop jobs can be created to export data to multiple hadoop clusters?
to export data to multiple hadoop clusters
If data is going into Hadoop, that's technically a Sqoop import
Not clear how you currently manage different clusters from one machine, but you would need to have the conf folder of all environments available for Sqoop to read
The sqoop command-line program is a wrapper which runs the bin/hadoop script shipped with Hadoop. If you have multiple installations of Hadoop present on your machine, you can select the Hadoop installation by setting the $HADOOP_HOME environment variable.
For example:
$ HADOOP_HOME=/path/to/some/hadoop sqoop import --arguments...
or:
$ export HADOOP_HOME=/some/path/to/hadoop
$ sqoop import --arguments...
If $HADOOP_HOME is not set, Sqoop will use the default installation location for Cloudera’s Distribution for Hadoop, /usr/lib/hadoop.
The active Hadoop configuration is loaded from $HADOOP_HOME/conf/, unless the $HADOOP_CONF_DIR environment variable is set
https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html#_controlling_the_hadoop_installation
Depending on how you setup Hadoop, Hortonworks only has Sqoop 1, while Cloudera (and maybe MapR) have Sqoop2, and those instructions are probably different since Sqoop2 architecture is different.
I use sqoop to do bulk hbase import. I use this option from sqoop: --hbase-bulkload. Sqoop will generate HFiles and import the hfiles to my hbase. I can verify the data is there and from sqoop log, it try to load hfile from
INFO mapreduce.LoadIncrementalHFiles: Trying to load hfile=hdfs://sandbox.hortonworks.com:8020/tmp/sqoop/data/u/2ce542f59b56466d988e49f7a7e512b7 first=\x00\x00\x00\x00\x00\x01\xDE1\xF8 last=\x00\x00\x00\x00\x00\x01\xEB:L
However, after the job is done. I try to see the files and it is not there anymore. I am using this hadoop command to view files:
hadoop fs -ls /tmp/sqoop/data
Is the HFile stored somewhere else? Or there is an option to keep it after import job?
Thanks
I have done importing data into hbase from oracle using sqoop itself.After that importing process has completed,the file is stored in hdfs file system
/home/USERNAME/FILENAME(TABLENAME)
I think your Hfile also be stored with the same concept so better once check it there
Is it possible to export data from a node that has not hadoop(HDFS) or Sqoop installed to a Hive server?
I would read the data from a source which could be Mysql or just files in some directory and then use the Hadoop core classes or something like Sqoop to export the data into my Hadoop cluster.
I am programming in Java.
Since you are final destination is a hive table. I would suggest the following :
Create a hive final table.
use the following command to load data from the other node
LOAD DATA LOCAL INPATH '<full local path>/kv1.txt' OVERWRITE INTO TABLE table_name;
refer this
Using Java , You could use JSCH lib to invoke these shell commands or so .
Hope this helps.