SemanticException Line 1:23 Invalid path - hadoop

I'm trying to load a text file from HDFS into Hive database using following command
hive> load data local inpath '/user/hive/input/emp_details.txt' into table emp;
I'm getting the following exception:
FAILED: SemanticException Line 1:23 Invalid path ''/user/hive/input/emp_details.txt'': No files matching path file:/user/hive/input/emp_details.txt
I'm using hive 1.2.2 on hadoop 2.7.2 on Centos7 OS
I gave the full permissions to the file path in HDFS using following command:
hdfs dfs -chmod -R 777 /user/hive/input
Not sure what else is missing, could anyone please suggest what to do. Thanks in advance!

LOCAL keyword means you are trying to load data from local filesystem and not from HDFS.
You should use:
load data inpath '/user/hive/input/emp_details.txt' into table emp;
See also Difference between `load data inpath ` and `location` in hive?

Related

Loading data into Hive Table from HDFS in Cloudera VM

When using the Cloudera VM how can you access information in the HDFS? I know there isn't a direct path to the HDFS but I also don't see how to dynamically access it.
After creating a Hive Table through the Hive CLI I attempted to load some data from a file located in the HDFS:
load data inpath '/test/student.txt' into table student;
But then I just get this error:
FAILED: SemanticException Line 1:17 Invalid path ''/test/student.txt'': No files matching path hdfs://quickstart.cloudera:8020/test/student.txt
I also tried to just load data not in the HDFS into a Hive Table like so:
load data inpath '/home/cloudera/Desktop/student.txt' into table student;
However that just produced this error:
FAILED: SemanticException Line 1:17 Invalid path ''/home/cloudera/Desktop/student.txt'': No files matching path hdfs://quickstart.cloudera:8020/home/cloudera/Desktop/student.txt
Once again I see it trying to access data with the root of hdfs://quickstart.cloudera:8020 and I'm not sure what that is, but it doesn't seem to be the root directory for the HDFS.
I'm not sure what I'm doing wrong but I made sure the file is located in the HDFS so I don't know why this error is coming up or how to fix it.
how can you access information in the HDFS
Well, you certainly don't need to use Hive to do it. hdfs dfs commands are how you interact with HDFS.
I'm not sure what that is, but it doesn't seem to be the root directory for the HDFS
It is the root of HDFS. quickstart.cloudera is the hostname of the VM. Port 8020 is the HDFS port.
Your exceptions are from the difference in using the LOCAL keyword.
What you're doing
LOAD DATA INPATH <hdfs location>
VS what you seem to be wanting
LOAD DATA LOCAL INPATH <local file location>
Or if the files are in HDFS, it's not clear how you have put files into it, but HDFS definitely doesn't have a /home folder or a Desktop, so the second error at least makes sense.
Anyways, hdfs dfs -put /test/students.text /test/ is one way to upload your file, assuming the hdfs:///test folder already exists. Otherwise, hdfs dfs -put /test/students.text /test renames your file to /test on HDFS
Note: You can create an EXTERNAL TABLE over an HDFS directory, you don't need to use the LOAD DATA command.

Where hive stores table locally?

I have created a hive table and trying to locate where hive have created an hdfs file for this table locally. The Hive version is 2.3.0.
I tried this command to retrieve the location of my table
hive> describe formatted table_name;
I got this as an output(only showing relevant output! tb2 is the table_name in this case)
Location: hdfs://localhost:54310/user/hive/warehouse/tb2
I have no clue how to redirect to hdfs://localhost:54310 locally(from terminal). Also the table is not present in hadoop default directory.
Try running the below command to view the hive table. In the output you will find a folder by your tablename
hadoop dfs -ls /user/hive/warehouse/tb2
A table in hive is basically a folder in hdfs.

Error creating hive table: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException

I have a multi node hadoop cluster and now I installed hive on the namenode.
Im trying to create some hive tables from files stored in hdfs but Im getting this strange error:
FAILED: Execution Error, return code 1 from
org.apache.hadoop.hive.ql.exec.DDLTask.
MetaException(message:hdfs://namenode-VirtualBox:9000/data/posts
/posts.tbl is not a directory or unable to create one)
hive>
But, then I tried to create a table from a file stored in hdfs with only 2kb and the table was created with success.
But when I try to create a table from a file stored in hdfs larger like 200mb, and maybe less, I got that error.
Do you know why this error can be happening?
I believe somwhere in the code the url: hdfs://namenode-VirtualBox:9000/data/posts
/posts.tbl
is parsed and the url should not have the file suffix (.tbl) should just be ".../posts"
I refer you to: Unable to Create Table in HIVE reading a CSV from HDFS

HDFS path to load data to Hive

I am running hadoop as a single node distribution.
Following the posts i moved a file to HDFS using
hadoop fs -put <local path> </usr/tmp/fileNAme.txt> .
Now I am trying to load the data from HDFS file to Hive table using the command below . Not able to find out what is the HDFS path relative to my local file system that i should be providing in the command below.
Load Command I am using from my java program to load the hive table is
LOAD DATA IN PATH ('HDFS PATH as it relates to my local File System???' ). All my attempts in giving the path including /usr/tmp/fileNAme.txt fails.
How do I resolve the full HDFS path?
Syntax is incorrect
load data local inpath '/tmp/categories01.psv' overwrite into table categories;
You have to specify local inpath in the command.
This command loads data from local file system
LOAD DATA LOCAL INPATH './examples/files/kv1.txt' OVERWRITE INTO TABLE pokes;
'LOCAL' signifies that the input file is on the local file system. If 'LOCAL' is omitted then it looks for the file in HDFS.
This command loads data from HDFS file system.
LOAD DATA INPATH './examples/files/kv1.txt' OVERWRITE INTO TABLE pokes;
Have a look into this article for more details.
The syntax for loading file from hdfs into hive is
LOAD DATA INPATH './examples/files/kv1.txt' OVERWRITE INTO TABLE pokes;
Please clarify how do i resolve the full HDFS path .
the full hdfs path in your syntax would be
hdfs://<namenode-hostname>:<port>/your/file/path

Issue with load data into HIVE

We have launched two EMR in AWS and installed the hadoop and hive-0.11.0 in one EMR and hive-0.13.1 other one.
Everything seems to be working fine but while trying to loading data into TABLE it's giving the below error and it happening in both the Hive Servers.
ERROR MESSAGE:
An error occurred when executing the SQL command: load data inpath
's3://buckername/export/employee_1/' into table employee_2 Query
returned non-zero code: 10028, cause: FAILED: SemanticException [Error
10028]: Line 1:17 Path is not legal
''s3://buckername/export/employee_1/'': Move from:
s3://buckername/export/employee_1 to:
hdfs://XXX.XX.XXX.XX:X000/mnt/hive_0110/warehouse/employee_2 is not
valid. Please check that values for params "default.fs.name" and
"hive.metastore.warehouse.dir" do not conflict. [SQL State=42000, DB
Errorcode=10028]
I searched for the reason and mean of this message, I found this link but when tried to execute command suggested in the given link it's also giving the below error.
Command:
--service metatool -updateLocation hdfs://XXX.XX.XXX.XX:X000 hdfs://XXX.XX.XXX.XX:X000
Initializing HiveMetaTool.. HiveMetaTool:Parsing failed. Reason:
Unrecognized option: -hiveconf
Any help in this will be really appreciated.
LOAD does not support S3. It is best practice to leave data in S3 and just use it as a Hive external table instead of copying the data to HDFS. Some references http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-hive-additional-features.html and When you create an external table in Hive with an S3 location is the data transfered?
If you have installed hive on your Hadoop cluster, the default storage of hive data is HDFS (hive.metastore.warehouse.dir=/user/hive/warehouse).
As a workaround you can copy the file from S3 file system to HDFS and then from HDFS load the file to hive.
Most probably we may need to modify the parameter "hive.exim.uri.scheme.whitelist=hdfs,pfile" to load the data from S3 file system.

Resources