Hive error when creating an external table (state=08S01,code=1) - hadoop

I'm trying to create an external table in Hive, but keep getting the following error:
create external table foobar (a STRING, b STRING) row format delimited fields terminated by "\t" stored as textfile location "/tmp/hive_test_1375711405.45852.txt";
Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask (state=08S01,code=1)
Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask (state=08S01,code=1)
Aborting command set because "force" is false and command failed: "create external table foobar (a STRING, b STRING) row format delimited fields terminated by "\t" stored as textfile location "/tmp/hive_test_1375711405.45852.txt";"
The contents of /tmp/hive_test_1375711405.45852.txt are:
abc\tdef
I'm connecting via the beeline command line interface, which uses Thrift HiveServer2.
System:
Hadoop 2.0.0-cdh4.3.0
Hive 0.10.0-cdh4.3.0
Beeline 0.10.0-cdh4.3.0
Client OS - Red Hat Enterprise Linux Server release 6.4 (Santiago)

The issue was that I was pointing the external table at a file in HDFS instead of a directory. The cryptic Hive error message really threw me off.
The solution is to create a directory and put the data file in there. To fix this for the above example, you'd create a directory under /tmp/foobar and place hive_test_1375711405.45852.txt in it. Then create the table like so:
create external table foobar (a STRING, b STRING) row format delimited fields terminated by "\t" stored as textfile location "/tmp/foobar";

We faced similar problem in our company (Sentry, hive, and kerberos combination). We solved it by removing all privileges from non fully defined hdfs_url. For example, we changed GRANT ALL ON URI '/user/test' TO ROLE test; to GRANT ALL ON URI 'hdfs-ha-name:///user/test' TO ROLE test;.
You can find the privileges for a specific URI in the Hive database (mysql in our case).

Related

HiveAccessControlException Permission denied: user [hive] does not have [ALL] privilege on [hdfs://sandbox-....:8020/user/..] (state=42000,code=40000)

When I'm trying to load a CSV file from local hadoop on sandbox to hive table, I'm getting the following exception
LOCATION 'hdfs://sandbox-hdp.hortonworks.com:8020/user/maria_dev/practice';
Error: Error while compiling statement: FAILED: HiveAccessControlException Permission denied: user [hive] does not have [ALL] privilege on [hdfs://sandbox-hdp.hortonworks.com:8020/user/ma
ria_dev/practice] (state=42000,code=40000)
I used the following code, can you please suggest a solution for this?
CREATE TABLE Sales_transactions(
Transaction_date DATE,
Product STRING,
Price FLOAT,
Payment_Type STRING,
Name STRING,
City STRING,
State STRING,
Country STRING,
Account_Created TIMESTAMP,
Last_Login TIMESTAMP,
Latitude FLOAT,
Longitude FLOAT,
Zip STRING
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
STORED AS TEXTFILE
**LOCATION 'hdfs://sandbox-hdp.hortonworks.com:8020/user/maria_dev/practice';** //Error pointing this line.
It is actually two step process and i think you missed step1.(Assuming your user have all proper access.)
Step 1 - Load local file into hdfs file system.
hdfs dfs -put /~/Sales_transactions.csv hdfs://sandbox-hdp.hortonworks.com:8020/user/maria_dev/practice`
Step 2 - Then load above hdfs data into the table.
load data inpath 'hdfs://sandbox-hdp.hortonworks.com:8020/user/maria_dev/practice/Sales_transactions.csv' into table myDB.Sales_transactions_table
Alternately you can use this as well -
LOAD DATA LOCAL INPATH '/~/Sales_transactions.csv' INTO TABLE mydb.Sales_transactions_table;

Hadoop backend with millions of records insertion

I am new to hadoop, can someone please suggest me how to upload millions of records to hadoop? Can I do this with hive and where can I see my hadoop records?
Until now I have used hive for creation of the database on hadoop and I am accessing it with localhost 50070. But I am unable to load data from csv file to hadoop from terminal. As it is giving me error:
FAILED: Error in semantic analysis: Line 2:0 Invalid path ''/user/local/hadoop/share/hadoop/hdfs'': No files matching path hdfs://localhost:54310/usr/local/hadoop/share/hadoop/hdfs
Can anyone suggest me some way to resolve it?
I suppose initially the data is in the Local file system.
So a simple workflow could be: load data from local to hadoop file system(HDFS), create a hive table over it and then load the data in hive table.
Step 1:
// put in HDFS
$~ hadoop fs -put /local_path/file_pattern* /path/to/your/HDFS_directory
// check files
$~ hadoop fs -ls /path/to/your/HDFS_directory
Step 2:
CREATE EXTERNAL TABLE if not exists mytable (
Year int,
name string
)
row format delimited
fields terminated by ','
lines terminated by '\n'
stored as TEXTFILE;
// display table structure
describe mytable;
Step 3:
Load data local INPATH '/path/to/your/HDFS_directory'
OVERWRITE into TABLE mytable;
// simple hive statement to fetch top 10 records
SELECT * FROM mytable limit 10;
You should use LOAD DATA LOCAL INPATH <local-file-path> to load the files from local directory to Hive tables.
If you dont specify LOCAL , then load command will assume to lookup the given file path from HDFS location to load.
Please refer below link,
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-Loadingfilesintotables

Hive creates empty table, even there're plenty of file

I put some files into hdfs (/path/to/directory/) which contain data like following;
63 EB44863EA74AA0C5D3ECF3D678A7DF59
62 FABBC9ED9719A5030B2F6A4591EDB180
59 6BF6D40AF15DE2D7E295EAFB9574BBF8
All of them named as _user_hive_warehouse_file_name_000XYZ_A. These files had downloaded from another hdfs.
I'm trying to create external table via Hive;
CREATE EXTERNAL TABLE users(
id int,
user string
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
LOCATION '/path/to/directory/';
It says;
OK
Time taken: 0.098 seconds
select * from users; returns empty.
select count(1) from users; returns 0.
Hive creates the table successfully, but it's always empty. If I put another file like another.txt, that contains the sample data mentioned above, select count(1) from users; returns 3.
What am I missing, why the table is empty?
Environment:
JDK 7
Hadoop 2.6.0
Hive 0.14.0
Ubuntu 14.04
I think you are encountering an issue that is peripherally discussed in HIVE-6431. In particular, this comment is the important one:
By default, FileInputFormat(which is the super class of various formats) in hadoop ignores file name starts with "_" or ".", and hard to walk around this in hive codebase.
The workaround is probably to avoid using filenames that begin with _ or .
When you run any command on Hive, it is run internally as a MapReduce Job on the HDFS path that you stored the file. The job uses the FileInputFormat to read the HDFS files which has a hiddenFileFilter which ignores any files starting with underscore ("_") and ("."). You can actually set other files to ignore by setting the FileInputFormat.SetInputPathFilter to a CustomPathFilter. Hadoop uses the files with underscores are "special" files to show job output and logs. This is probably why they are ignored.

Checking the table existence and loading the data into Hbase and HIve table

I have data in HDFS. And I wanted to load that data into hbase and hive table.
I have written a bash shell script in which I have written a pig script to load the data form HDFS to HBASE and also written hive script to load the data from HDFS to HIVE table which are working perfectly fine.Here my HDFS data files are with the same structure and I'm loading all the data files into single hbase and hive table.
Now my query is suppose if I receive some more data files in HDFS directory and if I run the shell script again it will create hbase and hive table again with the same name and tells table already exists. How can I write a hive and hbase query so that 1st it will check for the table existence, if table does not exists it create the table for the 1st time and load the data from HDFS to HBASE & Hive table. If the table is already exists then it will just insert the data into an existing hbase and hive table. It should not overwrite the data alreday exists in the tables.
How this can be done ?
Below is my script file: myScript.sh
echo "create 'goodtable','gt'" | hbase shell
pig -f a.pig -param input=/user/user/d/
hive -f h.hql
Where a.pig :
G = LOAD '$input' USING PigStorage(',') as (c1:chararray, c2:chararray,c3:chararray,c4:chararray,c5:chararray);
STORE G INTO 'hbase://goodtable' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('gt:name gt:state gt:phone_no gt:gender');
h.hql:
create external table hive_table(
id int,
name string,
state string,
phone_no int,
gender string) row format delimited fields terminated by ',' stored as textfile;
LOAD DATA INPATH '/user/user/d/' INTO TABLE hive_table;
I just wanted to add an example for HBase as Hive was already covered before:
if [[ $(echo "exists 'goodtable'" | hbase shell | grep 'not exist') ]];
then
echo "create 'goodtable','gt'" | hbase shell;
fi
For HIVE, you can add the command IF NOT EXISTS in the CREATE TABLE statement. See the documentation
I don't have much experience on Hbase, but I believe you can use EXISTS table_name command to check whether the table exists and then create the table is it doesn't exist. See here
#visakh is correct - you can see if table exists in HBase by entering the HBase shell, and typing : exists '<tablename>
In order to do this without entering the HBase shell interactively, you can create a simple ruby script such as the following:
exists 'mytable'
exit
Let's say you save this to a file called tabletest.rb. You can then execute this script by calling hbase shell tabletest.rb. This will create the following output, which you can then parse from your shell script:
Table tableisthere does exist
0 row(s) in 0.9830 seconds
OR
Table tableisNOTthere does not exist
0 row(s) in 0.9830 seconds
Adding more details for 'all in one' script:
Alternatively, you can create a more advanced script in ruby that checks for table existence and then will create it if needed - this is done calling the HBaseAdmin java api from within the ruby script.
conf = HBaseConfiguration.new
hbaseAdmin = HBaseAdmin.new(conf)
if !hbaseAdmin.tableExists('mytable')
hbaseAdmin.createTable('mytable',...)
end

Hive error: parseexception missing EOF

I am not sure what I am doing wrong here:
hive> CREATE TABLE default.testtbl(int1 INT,string1 STRING)
stored as orc
tblproperties ("orc.compress"="NONE")
LOCATION "/user/hive/test_table";
FAILED: ParseException line 1:107 missing EOF at 'LOCATION' near ')'
while the following query works perfectly fine:
hive> CREATE TABLE default.testtbl(int1 INT,string1 STRING)
stored as orc
tblproperties ("orc.compress"="NONE");
OK
Time taken: 0.106 seconds
Am I missing something here. Any pointers will help. Thanks!
Try put the "LOCATION" in front of "tblproperties" like below, worked for me.
CREATE TABLE default.testtbl(int1 INT,string1 STRING)
stored as orc
LOCATION "/user/hive/test_table"
tblproperties ("orc.compress"="NONE");
It seems even the sample SQL from book "Programming Hive" got the order wrong. Please reference to the official definition of create table command:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-CreateTable
#Haiying Wang pointed out that LOCATION is to be put in front of tblproperties.
But I think the error also occurs when location is specified above stored as.
Its better to stick to the correct order:
CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name -- (Note: TEMPORARY available in Hive 0.14.0 and later)
[(col_name data_type [COMMENT col_comment], ... [constraint_specification])]
[COMMENT table_comment]
[PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)]
[CLUSTERED BY (col_name, col_name, ...) [SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS]
[SKEWED BY (col_name, col_name, ...) -- (Note: Available in Hive 0.10.0 and later)]
ON ((col_value, col_value, ...), (col_value, col_value, ...), ...)
[STORED AS DIRECTORIES]
[
[ROW FORMAT row_format]
[STORED AS file_format]
| STORED BY 'storage.handler.class.name' [WITH SERDEPROPERTIES (...)] -- (Note: Available in Hive 0.6.0 and later)
]
[LOCATION hdfs_path]
[TBLPROPERTIES (property_name=property_value, ...)] -- (Note: Available in Hive 0.6.0 and later)
[AS select_statement]; -- (Note: Available in Hive 0.5.0 and later; not supported for external tables)
Refer: Hive Create Table
Check this post:
Loading Data from a .txt file to Table Stored as ORC in Hive
And check your source files present at the specified directory /user/hive/test_table. Incase the files are in .txt or some other non ORC format then you can follow the steps in the above post to come out of the error.
ParseException line lineNumber missing EOF at '.' near 'schemaName':
Got the above error while trying to execute the following command from linux script to truncate a hive table
dse -u username -p password hive -e "truncate table keyspace.tablename;"
Fix:
Need to separate the commands within the script line as follows -
dse -u username -p password hive -e "use keyspace; truncate table keyspace.tablename;"
Happy coding!
Got the same error while creating a table in hive.
I used the drop command to drop the table and then run the create table command that I had again.
Worked for me.
If you see this error when running the HiveQL from a file with the command "hive -f file.hql". And that it points the first line of your query most definitely this is because of a forgotten semicolon(;) for a previous query.
Since parser looks for semicolon(;) as a terminator for each query.
for example:
DROP TABLE IF EXISTS default.emp
create table default.emp (
field1 type,
field2 type)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'
STORED AS TEXTFILE
LOCATION 's3://gts-promocube/source-data/Lowes/POS/';
If you save the above in a file and execute it with hive -f, then you'll get the error:
FAILED: ParseException line 2:0 missing EOF at 'CREATE' near emp.
Solution: Put a semicolon(;) for the DROP TABLE command above.

Resources