Export data into Hive from a node without Hadoop(HDFS) installed - hadoop

Is it possible to export data from a node that has not hadoop(HDFS) or Sqoop installed to a Hive server?
I would read the data from a source which could be Mysql or just files in some directory and then use the Hadoop core classes or something like Sqoop to export the data into my Hadoop cluster.
I am programming in Java.

Since you are final destination is a hive table. I would suggest the following :
Create a hive final table.
use the following command to load data from the other node
LOAD DATA LOCAL INPATH '<full local path>/kv1.txt' OVERWRITE INTO TABLE table_name;
refer this
Using Java , You could use JSCH lib to invoke these shell commands or so .
Hope this helps.

Related

how hive is running without hive-site.xml file?

I am trying to set up hive on my local. I started all Hadoop processes and set up the {hive}/bin path. On command prompt I can run hive commands , create and read tables. My questions are -
1) is hive-site.xml is optional file ?
2) in absence of hive-site.xml file, how hive get information regrading metastore and other configuration?
If you're running Hive queries from your local machine which has Hadoop installed, hive-site.xml is not needed as you are talking directly to hive/bin in the Hive installation directory. You don't need to tell Hive where to find Hive.
If you wanted to run Hive commands from another machine, but interacting with Hive on your local machine, you'd need hive-site.xml.

Hive with emrfs

I am importing tables from Amazon RDS to Hive using sqoop. The process is working and the data is being stored in the hive default hdfs directory : /user/hive/warehouse.
I need to change the storage location from hdfs to emrfs s3.
It is my understanding that I need to change (in hive-site.xml on the master node) value of the property hive.metastore.warehouse.dir to the s3//bucket/warehouse-location. It appears that I don't have the permission to modify the file hive-site.xml.
I am looking for some advise on how best to do it.
Sudi
You will need sudo privileges to modify the hive-site.xml file on the masternode (located in /etc/hive/conf/hive-site.xml usually).
If this is not an option, try setting this property before the cluster is started. An example with CloudFormation :
"Configurations" : [
{
"Classification" : "hive-site",
"ConfigurationProperties" : {
"hive.metastore.warehouse.dir" : "s3://your_s3_bucket/hive_warehouse/",
}
}
],
Or through the EMR dialogue in the section for "Edit Software Settings"
sudo vi /etc/hive/conf/hive-site
or
sudo -su root
vi /etc/hive/conf/hive-site.xml
If you are using hive in EMR. The hive metastore is recommended to be set in an external DB or use glue data catalog as hive metastore.
For your concern,
Create the tables you want to import as external tables in the hive. While creating the external table you will have to provide the location parameter as s3 location of your table.
Example: Suppose I have s3 bucket named bucket-xyz and I want my data to be stored in s3://bukcet-xyz/my-table location, where my table name is my-table. Then I will create my-table as an external table using hive.
CREATE EXTERNAL TABLE my-table (A VARCHAR(30), B DOUBLE(9))
ROW FORMAT DELIMITED ...
LOCATION s3://bukcet-xyz/my-table
After this when you will insert data into this table using hive . Hive will store the data in the s3 location you specified.

HDFS path to load data to Hive

I am running hadoop as a single node distribution.
Following the posts i moved a file to HDFS using
hadoop fs -put <local path> </usr/tmp/fileNAme.txt> .
Now I am trying to load the data from HDFS file to Hive table using the command below . Not able to find out what is the HDFS path relative to my local file system that i should be providing in the command below.
Load Command I am using from my java program to load the hive table is
LOAD DATA IN PATH ('HDFS PATH as it relates to my local File System???' ). All my attempts in giving the path including /usr/tmp/fileNAme.txt fails.
How do I resolve the full HDFS path?
Syntax is incorrect
load data local inpath '/tmp/categories01.psv' overwrite into table categories;
You have to specify local inpath in the command.
This command loads data from local file system
LOAD DATA LOCAL INPATH './examples/files/kv1.txt' OVERWRITE INTO TABLE pokes;
'LOCAL' signifies that the input file is on the local file system. If 'LOCAL' is omitted then it looks for the file in HDFS.
This command loads data from HDFS file system.
LOAD DATA INPATH './examples/files/kv1.txt' OVERWRITE INTO TABLE pokes;
Have a look into this article for more details.
The syntax for loading file from hdfs into hive is
LOAD DATA INPATH './examples/files/kv1.txt' OVERWRITE INTO TABLE pokes;
Please clarify how do i resolve the full HDFS path .
the full hdfs path in your syntax would be
hdfs://<namenode-hostname>:<port>/your/file/path

How to get the HFile from sqoop after bulk hbase import?

I use sqoop to do bulk hbase import. I use this option from sqoop: --hbase-bulkload. Sqoop will generate HFiles and import the hfiles to my hbase. I can verify the data is there and from sqoop log, it try to load hfile from
INFO mapreduce.LoadIncrementalHFiles: Trying to load hfile=hdfs://sandbox.hortonworks.com:8020/tmp/sqoop/data/u/2ce542f59b56466d988e49f7a7e512b7 first=\x00\x00\x00\x00\x00\x01\xDE1\xF8 last=\x00\x00\x00\x00\x00\x01\xEB:L
However, after the job is done. I try to see the files and it is not there anymore. I am using this hadoop command to view files:
hadoop fs -ls /tmp/sqoop/data
Is the HFile stored somewhere else? Or there is an option to keep it after import job?
Thanks
I have done importing data into hbase from oracle using sqoop itself.After that importing process has completed,the file is stored in hdfs file system
/home/USERNAME/FILENAME(TABLENAME)
I think your Hfile also be stored with the same concept so better once check it there

Can we export data from hadoop to csv using sqoop

I try to read spec from sqoop and there is no solution right now about how to export to csv format using sqoop?
is it possible if we want ti export data from hadoop to csv using sqoop?
is there any solution?
You don't need sqoop to copy data from Hadoop to local filesystem using Sqoop. Sqoop strictly works with importing/exporting data to/from RDBMS (using JDBC).
Rather, you could just copy the data from Hadoop to local filesystem using hadoop command line tool hadoop fs -get [hadoop_src] [local_dest].

Resources