Hive error: parseexception missing EOF - hadoop

I am not sure what I am doing wrong here:
hive> CREATE TABLE default.testtbl(int1 INT,string1 STRING)
stored as orc
tblproperties ("orc.compress"="NONE")
LOCATION "/user/hive/test_table";
FAILED: ParseException line 1:107 missing EOF at 'LOCATION' near ')'
while the following query works perfectly fine:
hive> CREATE TABLE default.testtbl(int1 INT,string1 STRING)
stored as orc
tblproperties ("orc.compress"="NONE");
OK
Time taken: 0.106 seconds
Am I missing something here. Any pointers will help. Thanks!

Try put the "LOCATION" in front of "tblproperties" like below, worked for me.
CREATE TABLE default.testtbl(int1 INT,string1 STRING)
stored as orc
LOCATION "/user/hive/test_table"
tblproperties ("orc.compress"="NONE");
It seems even the sample SQL from book "Programming Hive" got the order wrong. Please reference to the official definition of create table command:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-CreateTable

#Haiying Wang pointed out that LOCATION is to be put in front of tblproperties.
But I think the error also occurs when location is specified above stored as.
Its better to stick to the correct order:
CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name -- (Note: TEMPORARY available in Hive 0.14.0 and later)
[(col_name data_type [COMMENT col_comment], ... [constraint_specification])]
[COMMENT table_comment]
[PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)]
[CLUSTERED BY (col_name, col_name, ...) [SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS]
[SKEWED BY (col_name, col_name, ...) -- (Note: Available in Hive 0.10.0 and later)]
ON ((col_value, col_value, ...), (col_value, col_value, ...), ...)
[STORED AS DIRECTORIES]
[
[ROW FORMAT row_format]
[STORED AS file_format]
| STORED BY 'storage.handler.class.name' [WITH SERDEPROPERTIES (...)] -- (Note: Available in Hive 0.6.0 and later)
]
[LOCATION hdfs_path]
[TBLPROPERTIES (property_name=property_value, ...)] -- (Note: Available in Hive 0.6.0 and later)
[AS select_statement]; -- (Note: Available in Hive 0.5.0 and later; not supported for external tables)
Refer: Hive Create Table

Check this post:
Loading Data from a .txt file to Table Stored as ORC in Hive
And check your source files present at the specified directory /user/hive/test_table. Incase the files are in .txt or some other non ORC format then you can follow the steps in the above post to come out of the error.

ParseException line lineNumber missing EOF at '.' near 'schemaName':
Got the above error while trying to execute the following command from linux script to truncate a hive table
dse -u username -p password hive -e "truncate table keyspace.tablename;"
Fix:
Need to separate the commands within the script line as follows -
dse -u username -p password hive -e "use keyspace; truncate table keyspace.tablename;"
Happy coding!

Got the same error while creating a table in hive.
I used the drop command to drop the table and then run the create table command that I had again.
Worked for me.

If you see this error when running the HiveQL from a file with the command "hive -f file.hql". And that it points the first line of your query most definitely this is because of a forgotten semicolon(;) for a previous query.
Since parser looks for semicolon(;) as a terminator for each query.
for example:
DROP TABLE IF EXISTS default.emp
create table default.emp (
field1 type,
field2 type)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'
STORED AS TEXTFILE
LOCATION 's3://gts-promocube/source-data/Lowes/POS/';
If you save the above in a file and execute it with hive -f, then you'll get the error:
FAILED: ParseException line 2:0 missing EOF at 'CREATE' near emp.
Solution: Put a semicolon(;) for the DROP TABLE command above.

Related

Hive one line command to catch SCHEMA + TABLE NAME info

Is there a way to catch all schema + table name info in a single command through Hive in a similar way to
SELECT * FROM information_schema.tables
from the PostgreSQL world?
show databases and show tables combined in a loop [here an example] is an answer, but I'm looking for a more compact way to have the same result in a single command.
It's been long I have worked on Hive Queries but as far as I remember you can probably use
hive> desc formatted tableName;
or
hive> describe formatted tableName;
It will give you all the relevant information related to the Table like the Schema, Partition info, Table Type like Managed Table, etc
I am not sure If you are particularly looking for this ??
There is another way to query Hive Tables, is writing Hive Scripts which can be called from Hadoop Terminal rather than from Hive Terminal itself.
std]$ cat sample.hql or vi sample.hql
use dbName;
select * from tableName;
desc formatted tableName;
# this hql script can be called from outside the hive terminal
std]$ hive -f sample.hql
or, without even have to write script file you can probably query hive as
std]$ hive -e "use dbName; select * from emp;" > text.txt or >> to append
On the Database level, you can probably query as :
hive> use dbName;
hive> set hive.cli.print.current.db=true;
hive(dbName)> describe database dbName;
it will bring metadata from MySQL(metastore) about the Database.

Error Copying data from HDFS to External Table In Hive

i am trying to insert data from hdfs to external table in hive. but getting below error.
Error :
Usage: java FsShell [-put <localsrc> ... <dst>]
Command failed with exit code = 255
Command
hive> !hadoop fs -put /myfolder/logs/pv_ext/2013/08/11/log/data/Sacramentorealestatetransactions.csv
> ;
Edited:
file location : /yapstone/logs/pv_ext/somedatafor_7_11/Sacramentorealestatetransactions.csv
table location : hdfs://sandbox:8020/yapstone/logs/pv_ext/2013/08/11/log/data
i am in hive
executing command
!hadoop fs -put /yapstone/logs/pv_ext/somedatafor_7_11/Sacramentorealestatetransactions.csv hdfs://sandbox:8020/yapstone/logs/pv_ext/2013/08/11/log/data
getting error :
put: File /yapstone/logs/pv_ext/somedatafor_7_11/Sacramentorealestatetransactions.csv does not exist.
Command failed with exit code = 255
Please share your suggestion.
Thanks
Here are two methods to load data into the external Hive table.
Method 1:
a) Get the location of the HDFS folder for the Hive external table.
hive> desc formatted mytable;
b) Note the value for the Location property in output. Say, it is hdfs:///hive-data/mydata
c) Then, put the file from local disk to HDFS
$ hadoop fs -put /location/of/data/file.csv hdfs:///hive-data/mydata
Method 2:
a) Load data via this Hive command
hive > LOAD DATA LOCAL INPATH '/location/of/data/file.csv' INTO TABLE mytable;
One more method. Change Hive table location:
alter table table_name set location='hdfs://your_data/folder';
This method may help you to better.
Need to create a table in HIVE.
hive> CREATE EXTERNAL TABLE IF NOT EXISTS mytable(myid INT, a1 STRING, a2 STRING....)
row format delimited fields terminated by '\t' stored as textfile LOCATION
hdfs://sandbox:8020/yapstone/logs/pv_ext/2013/08/11/log/data;
Load data from HDFS to hive table.
hive> LOAD DATA INPATH /yapstone/logs/pv_ext/somedatafor_7_11/Sacramentorealestatetransactions.csv INTO TABLE mytable;
NOTE: If you load data from HDFS to HIVE (INPATH) the data will be moved from HDFS
location to HIVE. So, the data won't available on HDFS location for next time.
Check if the data loaded successfully.
hive> SELECT * FROM mytable;

MSCK repair table failing for schema tables

My hive table name is in the below format:
schema_name.hive_table_name
eg: schema1.abc;
Now when I try to do MSCK repair table on the above hive table it throws below error.
Logging initialized using configuration in file:/etc/hive/conf.dist/hive-log4j.properties
FAILED: ParseException line 1:28 missing EOF at '.' near 'schema_name'
Below is the command I used:
hive -e "MSCK repair table schema_name.hive_table_name"
Could any one help on this?
I tried the below statement:
hive -e "use schema_name;MSCK repair table hive_table_name"
This allows to add partition to hive with the specific schema mentioned .
It worked for me.
Thanks

add date time from flat file name cloudera

I started an EC2 cluster on amazon to install cloudera...I got it installed and configured and loaded some of the Wiki Page Views public snapshot into HDFS. The structure of the files are as such:
projectcode, pagename, pageviews, bytes
the files are named as such:
pagecounts-20090430-230000.gz
date time
when loading the data from HDFS to Impala, I do it as such:
CREATE EXTERNAL TABLE wikiPgvws
(
project_code varchar(100),
page_name varchar(1000),
page_views int,
page_bytes int
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ' '
LOCATION '/user/hdfs';
one thing I missed is the date and time of each of the file. The dir:
/user/hdfs
contains multiple pagecount files associated with different dates and times. How can one pull that information and store it in a column when loading to impala?
I think the thing you are missing is the concept of partitions. If you define the table as partitioned, the data may be divided to different partitions based on the timestamp(in the name) of the file. I'm able to work around it in hive, I hope you to do the needful(if any) for impala as there query syntax is the same.
For me, this problem is not possible to solve only using hive. So I mixed up bash with hive scripting and it works fine for me. This is how I wrapped it up :
Create table wikiPgvws with partition
Create table wikiTmp with same fields as wikiPgvws except for partitions
For each file
i. Load data into wikiTmp
ii. grep timeStamp from fileName
iii. Use sed to replace placeholders in a predefined hql script file to load the data to the actual table. Then run it.
Drop table wikiTmp & remove tmp.hql
The script is as follows :
#!/bin/bash
hive -e "CREATE EXTERNAL TABLE wikiPgvws(
project_code varchar(100),
page_name varchar(1000),
page_views int,
page_bytes int
)
PARTITIONED BY(dts STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ' '
STORED AS TEXTFILE";
hive -e "CREATE TABLE wikiTmp(
project_code varchar(100),
page_name varchar(1000),
page_views int,
page_bytes int
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ' '
STORED AS TEXTFILE"
for fileName in $(hadoop fs -ls /user/hdfs/bounty/pagecounts-*.txt | grep -Po '(?<=\s)(/user.*$)')
do
echo "currentFile :$fileName"
dst=$(echo $filename | grep -oE '[0-9]{8}-[0-9]{6}')
echo "currentStamp $dst"
sed "s!sourceFile!'$fileName'!" t.hql > tmp.hql
sed -i "s!targetPartition!$dst!" tmp.hql
hive -f tmp.hql
done
hive -e "DROP TABLE wikiTmp"
rm -f tmp.hql
The hql script consists of just two lines :
LOAD DATA INPATH sourceFile OVERWRITE INTO TABLE wikiTmp;
INSERT OVERWRITE TABLE wikiPgvws PARTITION (dts = 'targetPartition') SELECT w.* FROM wikiTmp w;
Epilogue :
Check, whether options equivalent to hive -e & hive -f are available in impala. Without them, this script is of no use to you. Again the grep commands to fetch the fileName & timeStamp need to be modified according to your table location and stamp pattern. It's just one a way to show how the job can be done, but couldn't able to find another one.
Enhencement
If everything works well, consider merging the first two DDLs into another script to make it look cleaner. Although, I'm not sure that hql script arguments can be used to define partition values, still you can have a look to replace sed.

Hive error when creating an external table (state=08S01,code=1)

I'm trying to create an external table in Hive, but keep getting the following error:
create external table foobar (a STRING, b STRING) row format delimited fields terminated by "\t" stored as textfile location "/tmp/hive_test_1375711405.45852.txt";
Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask (state=08S01,code=1)
Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask (state=08S01,code=1)
Aborting command set because "force" is false and command failed: "create external table foobar (a STRING, b STRING) row format delimited fields terminated by "\t" stored as textfile location "/tmp/hive_test_1375711405.45852.txt";"
The contents of /tmp/hive_test_1375711405.45852.txt are:
abc\tdef
I'm connecting via the beeline command line interface, which uses Thrift HiveServer2.
System:
Hadoop 2.0.0-cdh4.3.0
Hive 0.10.0-cdh4.3.0
Beeline 0.10.0-cdh4.3.0
Client OS - Red Hat Enterprise Linux Server release 6.4 (Santiago)
The issue was that I was pointing the external table at a file in HDFS instead of a directory. The cryptic Hive error message really threw me off.
The solution is to create a directory and put the data file in there. To fix this for the above example, you'd create a directory under /tmp/foobar and place hive_test_1375711405.45852.txt in it. Then create the table like so:
create external table foobar (a STRING, b STRING) row format delimited fields terminated by "\t" stored as textfile location "/tmp/foobar";
We faced similar problem in our company (Sentry, hive, and kerberos combination). We solved it by removing all privileges from non fully defined hdfs_url. For example, we changed GRANT ALL ON URI '/user/test' TO ROLE test; to GRANT ALL ON URI 'hdfs-ha-name:///user/test' TO ROLE test;.
You can find the privileges for a specific URI in the Hive database (mysql in our case).

Resources