Loading files into hive through JDBC - jdbc

I'm getting this error when trying to load a file into Hive through its JDBC driver. The Hive instance is running on a vm. The file loads perfectly fine when I load it through hive commandline. The file is located on the same instance as Hive. I hope jdbc supports the load command.
java.sql.SQLException: Query returned non-zero code: 10, cause: FAILED: Error in semantic analysis: Line 1:23 Invalid path ''/home/cloudera/Desktop/test.csv'': No files matching path file:/home/cloudera/Desktop/test.csv
at org.apache.hadoop.hive.jdbc.HiveStatement.executeQuery(HiveStatement.java:189)
at Main.main(Main.java:55)

Since hive in-turn runs in a map/reduce environment, user need to provide hdfs path for the csv file (not local path) when using hive jdbc. While running using hive cli, it takes local path as it takes care of uploading files to hdfs to load into table.

Related

how hive is running without hive-site.xml file?

I am trying to set up hive on my local. I started all Hadoop processes and set up the {hive}/bin path. On command prompt I can run hive commands , create and read tables. My questions are -
1) is hive-site.xml is optional file ?
2) in absence of hive-site.xml file, how hive get information regrading metastore and other configuration?
If you're running Hive queries from your local machine which has Hadoop installed, hive-site.xml is not needed as you are talking directly to hive/bin in the Hive installation directory. You don't need to tell Hive where to find Hive.
If you wanted to run Hive commands from another machine, but interacting with Hive on your local machine, you'd need hive-site.xml.

Loading data into Hive Table from HDFS in Cloudera VM

When using the Cloudera VM how can you access information in the HDFS? I know there isn't a direct path to the HDFS but I also don't see how to dynamically access it.
After creating a Hive Table through the Hive CLI I attempted to load some data from a file located in the HDFS:
load data inpath '/test/student.txt' into table student;
But then I just get this error:
FAILED: SemanticException Line 1:17 Invalid path ''/test/student.txt'': No files matching path hdfs://quickstart.cloudera:8020/test/student.txt
I also tried to just load data not in the HDFS into a Hive Table like so:
load data inpath '/home/cloudera/Desktop/student.txt' into table student;
However that just produced this error:
FAILED: SemanticException Line 1:17 Invalid path ''/home/cloudera/Desktop/student.txt'': No files matching path hdfs://quickstart.cloudera:8020/home/cloudera/Desktop/student.txt
Once again I see it trying to access data with the root of hdfs://quickstart.cloudera:8020 and I'm not sure what that is, but it doesn't seem to be the root directory for the HDFS.
I'm not sure what I'm doing wrong but I made sure the file is located in the HDFS so I don't know why this error is coming up or how to fix it.
how can you access information in the HDFS
Well, you certainly don't need to use Hive to do it. hdfs dfs commands are how you interact with HDFS.
I'm not sure what that is, but it doesn't seem to be the root directory for the HDFS
It is the root of HDFS. quickstart.cloudera is the hostname of the VM. Port 8020 is the HDFS port.
Your exceptions are from the difference in using the LOCAL keyword.
What you're doing
LOAD DATA INPATH <hdfs location>
VS what you seem to be wanting
LOAD DATA LOCAL INPATH <local file location>
Or if the files are in HDFS, it's not clear how you have put files into it, but HDFS definitely doesn't have a /home folder or a Desktop, so the second error at least makes sense.
Anyways, hdfs dfs -put /test/students.text /test/ is one way to upload your file, assuming the hdfs:///test folder already exists. Otherwise, hdfs dfs -put /test/students.text /test renames your file to /test on HDFS
Note: You can create an EXTERNAL TABLE over an HDFS directory, you don't need to use the LOAD DATA command.

Error creating hive table: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException

I have a multi node hadoop cluster and now I installed hive on the namenode.
Im trying to create some hive tables from files stored in hdfs but Im getting this strange error:
FAILED: Execution Error, return code 1 from
org.apache.hadoop.hive.ql.exec.DDLTask.
MetaException(message:hdfs://namenode-VirtualBox:9000/data/posts
/posts.tbl is not a directory or unable to create one)
hive>
But, then I tried to create a table from a file stored in hdfs with only 2kb and the table was created with success.
But when I try to create a table from a file stored in hdfs larger like 200mb, and maybe less, I got that error.
Do you know why this error can be happening?
I believe somwhere in the code the url: hdfs://namenode-VirtualBox:9000/data/posts
/posts.tbl
is parsed and the url should not have the file suffix (.tbl) should just be ".../posts"
I refer you to: Unable to Create Table in HIVE reading a CSV from HDFS

HDFS path to load data to Hive

I am running hadoop as a single node distribution.
Following the posts i moved a file to HDFS using
hadoop fs -put <local path> </usr/tmp/fileNAme.txt> .
Now I am trying to load the data from HDFS file to Hive table using the command below . Not able to find out what is the HDFS path relative to my local file system that i should be providing in the command below.
Load Command I am using from my java program to load the hive table is
LOAD DATA IN PATH ('HDFS PATH as it relates to my local File System???' ). All my attempts in giving the path including /usr/tmp/fileNAme.txt fails.
How do I resolve the full HDFS path?
Syntax is incorrect
load data local inpath '/tmp/categories01.psv' overwrite into table categories;
You have to specify local inpath in the command.
This command loads data from local file system
LOAD DATA LOCAL INPATH './examples/files/kv1.txt' OVERWRITE INTO TABLE pokes;
'LOCAL' signifies that the input file is on the local file system. If 'LOCAL' is omitted then it looks for the file in HDFS.
This command loads data from HDFS file system.
LOAD DATA INPATH './examples/files/kv1.txt' OVERWRITE INTO TABLE pokes;
Have a look into this article for more details.
The syntax for loading file from hdfs into hive is
LOAD DATA INPATH './examples/files/kv1.txt' OVERWRITE INTO TABLE pokes;
Please clarify how do i resolve the full HDFS path .
the full hdfs path in your syntax would be
hdfs://<namenode-hostname>:<port>/your/file/path

Issue with load data into HIVE

We have launched two EMR in AWS and installed the hadoop and hive-0.11.0 in one EMR and hive-0.13.1 other one.
Everything seems to be working fine but while trying to loading data into TABLE it's giving the below error and it happening in both the Hive Servers.
ERROR MESSAGE:
An error occurred when executing the SQL command: load data inpath
's3://buckername/export/employee_1/' into table employee_2 Query
returned non-zero code: 10028, cause: FAILED: SemanticException [Error
10028]: Line 1:17 Path is not legal
''s3://buckername/export/employee_1/'': Move from:
s3://buckername/export/employee_1 to:
hdfs://XXX.XX.XXX.XX:X000/mnt/hive_0110/warehouse/employee_2 is not
valid. Please check that values for params "default.fs.name" and
"hive.metastore.warehouse.dir" do not conflict. [SQL State=42000, DB
Errorcode=10028]
I searched for the reason and mean of this message, I found this link but when tried to execute command suggested in the given link it's also giving the below error.
Command:
--service metatool -updateLocation hdfs://XXX.XX.XXX.XX:X000 hdfs://XXX.XX.XXX.XX:X000
Initializing HiveMetaTool.. HiveMetaTool:Parsing failed. Reason:
Unrecognized option: -hiveconf
Any help in this will be really appreciated.
LOAD does not support S3. It is best practice to leave data in S3 and just use it as a Hive external table instead of copying the data to HDFS. Some references http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-hive-additional-features.html and When you create an external table in Hive with an S3 location is the data transfered?
If you have installed hive on your Hadoop cluster, the default storage of hive data is HDFS (hive.metastore.warehouse.dir=/user/hive/warehouse).
As a workaround you can copy the file from S3 file system to HDFS and then from HDFS load the file to hive.
Most probably we may need to modify the parameter "hive.exim.uri.scheme.whitelist=hdfs,pfile" to load the data from S3 file system.

Resources