HDFS path to load data to Hive - hadoop

I am running hadoop as a single node distribution.
Following the posts i moved a file to HDFS using
hadoop fs -put <local path> </usr/tmp/fileNAme.txt> .
Now I am trying to load the data from HDFS file to Hive table using the command below . Not able to find out what is the HDFS path relative to my local file system that i should be providing in the command below.
Load Command I am using from my java program to load the hive table is
LOAD DATA IN PATH ('HDFS PATH as it relates to my local File System???' ). All my attempts in giving the path including /usr/tmp/fileNAme.txt fails.
How do I resolve the full HDFS path?

Syntax is incorrect
load data local inpath '/tmp/categories01.psv' overwrite into table categories;
You have to specify local inpath in the command.

This command loads data from local file system
LOAD DATA LOCAL INPATH './examples/files/kv1.txt' OVERWRITE INTO TABLE pokes;
'LOCAL' signifies that the input file is on the local file system. If 'LOCAL' is omitted then it looks for the file in HDFS.
This command loads data from HDFS file system.
LOAD DATA INPATH './examples/files/kv1.txt' OVERWRITE INTO TABLE pokes;
Have a look into this article for more details.

The syntax for loading file from hdfs into hive is
LOAD DATA INPATH './examples/files/kv1.txt' OVERWRITE INTO TABLE pokes;
Please clarify how do i resolve the full HDFS path .
the full hdfs path in your syntax would be
hdfs://<namenode-hostname>:<port>/your/file/path

Related

SemanticException Line 1:23 Invalid path

I'm trying to load a text file from HDFS into Hive database using following command
hive> load data local inpath '/user/hive/input/emp_details.txt' into table emp;
I'm getting the following exception:
FAILED: SemanticException Line 1:23 Invalid path ''/user/hive/input/emp_details.txt'': No files matching path file:/user/hive/input/emp_details.txt
I'm using hive 1.2.2 on hadoop 2.7.2 on Centos7 OS
I gave the full permissions to the file path in HDFS using following command:
hdfs dfs -chmod -R 777 /user/hive/input
Not sure what else is missing, could anyone please suggest what to do. Thanks in advance!
LOCAL keyword means you are trying to load data from local filesystem and not from HDFS.
You should use:
load data inpath '/user/hive/input/emp_details.txt' into table emp;
See also Difference between `load data inpath ` and `location` in hive?

Loading data into Hive Table from HDFS in Cloudera VM

When using the Cloudera VM how can you access information in the HDFS? I know there isn't a direct path to the HDFS but I also don't see how to dynamically access it.
After creating a Hive Table through the Hive CLI I attempted to load some data from a file located in the HDFS:
load data inpath '/test/student.txt' into table student;
But then I just get this error:
FAILED: SemanticException Line 1:17 Invalid path ''/test/student.txt'': No files matching path hdfs://quickstart.cloudera:8020/test/student.txt
I also tried to just load data not in the HDFS into a Hive Table like so:
load data inpath '/home/cloudera/Desktop/student.txt' into table student;
However that just produced this error:
FAILED: SemanticException Line 1:17 Invalid path ''/home/cloudera/Desktop/student.txt'': No files matching path hdfs://quickstart.cloudera:8020/home/cloudera/Desktop/student.txt
Once again I see it trying to access data with the root of hdfs://quickstart.cloudera:8020 and I'm not sure what that is, but it doesn't seem to be the root directory for the HDFS.
I'm not sure what I'm doing wrong but I made sure the file is located in the HDFS so I don't know why this error is coming up or how to fix it.
how can you access information in the HDFS
Well, you certainly don't need to use Hive to do it. hdfs dfs commands are how you interact with HDFS.
I'm not sure what that is, but it doesn't seem to be the root directory for the HDFS
It is the root of HDFS. quickstart.cloudera is the hostname of the VM. Port 8020 is the HDFS port.
Your exceptions are from the difference in using the LOCAL keyword.
What you're doing
LOAD DATA INPATH <hdfs location>
VS what you seem to be wanting
LOAD DATA LOCAL INPATH <local file location>
Or if the files are in HDFS, it's not clear how you have put files into it, but HDFS definitely doesn't have a /home folder or a Desktop, so the second error at least makes sense.
Anyways, hdfs dfs -put /test/students.text /test/ is one way to upload your file, assuming the hdfs:///test folder already exists. Otherwise, hdfs dfs -put /test/students.text /test renames your file to /test on HDFS
Note: You can create an EXTERNAL TABLE over an HDFS directory, you don't need to use the LOAD DATA command.

Why DATA is COPIED and not MOVED while loading data from local filesystem Hive hadoop

When we use following command:
Load data local inpath "mypath"
why the data is copied from local filesystem into HDFS and not moved?
Since you are moving data between 2 different file systems (sh + HDFS) this cannot be a metadata operation as in non-local load.
The data itself should be copied.
Theoretically this command could also initiate a deletion command of the source file, but what for?

Export data into Hive from a node without Hadoop(HDFS) installed

Is it possible to export data from a node that has not hadoop(HDFS) or Sqoop installed to a Hive server?
I would read the data from a source which could be Mysql or just files in some directory and then use the Hadoop core classes or something like Sqoop to export the data into my Hadoop cluster.
I am programming in Java.
Since you are final destination is a hive table. I would suggest the following :
Create a hive final table.
use the following command to load data from the other node
LOAD DATA LOCAL INPATH '<full local path>/kv1.txt' OVERWRITE INTO TABLE table_name;
refer this
Using Java , You could use JSCH lib to invoke these shell commands or so .
Hope this helps.

Loading files into hive through JDBC

I'm getting this error when trying to load a file into Hive through its JDBC driver. The Hive instance is running on a vm. The file loads perfectly fine when I load it through hive commandline. The file is located on the same instance as Hive. I hope jdbc supports the load command.
java.sql.SQLException: Query returned non-zero code: 10, cause: FAILED: Error in semantic analysis: Line 1:23 Invalid path ''/home/cloudera/Desktop/test.csv'': No files matching path file:/home/cloudera/Desktop/test.csv
at org.apache.hadoop.hive.jdbc.HiveStatement.executeQuery(HiveStatement.java:189)
at Main.main(Main.java:55)
Since hive in-turn runs in a map/reduce environment, user need to provide hdfs path for the csv file (not local path) when using hive jdbc. While running using hive cli, it takes local path as it takes care of uploading files to hdfs to load into table.

Resources