I'm trying to load some test data to a simple Hive table. The data is comma separated, but the individual elements are not enclosed in double quotes. I'm getting an error due to this. How do I tell Hive not to expect varchar fields to be enclosed in quotes. Manually adding quotes to varchar fields is not an option since the input file I'm trying to use has thousands of records. Sample query and data below.
create table mydatabase.flights(FlightDate varchar(10),Airline int,FlightNum int,Origin varchar(4),Destination varchar(4),Departure varchar(4),DepDelay double,Arrival varchar(4),ArrivalDelay double,Airtime double,Distance double) row format delimited;
insert into mydatabase.flights(FlightDate,Airline,FlightNum,Origin,Destination,Departure,DepDelay,Arrival,ArrivalDelay,Airtime,Distance)
values(2014-04-01,19805,1,JFK,LAX,0854,-6.00,1217,2.00,355.00,2475.00);
The insert query above gives me an error message. It works fine if I enclose the varchar fields in quotes.
Error while compiling statement: FAILED: ParseException line 11:11 mismatched input '-' expecting ) near '2014' in value row constructor
I'm loading data using the following query
load data inpath '/user/alpsusa/hive/flights.csv' overwrite into table mydatabase.flights;
After load, I see only the first field being loaded. Rest all are NULL.
Sample data
2014-04-01,19805,1,JFK,LAX,0854,-6.00,1217,2.00,355.00,2475.00
2014-04-01,19805,2,LAX,JFK,0944,14.00,1736,-29.00,269.00,2475.00
2014-04-01,19805,3,JFK,LAX,1224,-6.00,1614,39.00,371.00,2475.00
2014-04-01,19805,4,LAX,JFK,1240,25.00,2028,-27.00,264.00,2475.00
2014-04-01,19805,5,DFW,HNL,1300,-5.00,1650,15.00,510.00,3784.00
Below is the output of DESCRIBE FORMATTED
I need to create an external hive table from hdfs location where one column in files has reserved name (end).
When running the script I get the error:
"cannot recognize input near 'end' 'STRUCT' '<' in column specification"
I found 2 solutions.
The first one is to set hive.support.sql11.reserved.keywords=false, but this option has been removed.
https://issues.apache.org/jira/browse/HIVE-14872
The second solution is to use quoted identifiers (column).
But in this case I get the error:
"org.apache.hadoop.hive.serde2.SerDeException: org.codehaus.jackson.JsonParseException: Unexpected character ('c' (code 99)): was expecting comma to separate OBJECT entries"
This is my code for table creation:
CREATE TEMPORARY EXTERNAL TABLE ${tmp_db}.${tmp_table}
(
id STRING,
email STRUCT<string:STRING>,
start STRUCT<long:BIGINT>,
end STRUCT<long:BIGINT>
)
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
LOCATION '${input_dir}';
It's not possible to rename the column.
Does anybody know the solution for this problem? Or maybe any ideas?
Thanks a lot in advance!
can you try below.
hive> set hive.support.quoted.identifiers=column;
hive> create temporary table sp_char ( `#` int, `end` string);
OK
Time taken: 0.123 seconds
OK
Time taken: 0.362 seconds
hive>
When you set hive property hive.support.quoted.identifiers=column all the values within back ticks are treated as literals.
Other value for above property is none , when it is set to none you can use regex to evaluate the column or expression value.
Hope this helps
I am facing difficulties in getting the dump(text file delimited by ^) for a Query in hive for my project -sentimental analysis in stock market using twitter.
The query which should fetch me an output in hdfs or local file-system is given below:
hive> select t.cmpname,t.datecol,t.tweet,st.diff FROM tweet t LEFT OUTER JOIN stock st ON(t.datecol = st.datecol AND lower(t.cmpname) = lower(st.cmpname));
The query produces the correct output but when I try dumping it in hdfs it gives me an error.
I ran through various other solutions given in stackoverflow for dumping but I was not able to find an appropriate solution which suits me.
Thanks for your help.
INSERT OVERWRITE DIRECTORY '/path/to/dir'
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '^'
SELECT t.cmpname,t.datecol,t.tweet,st.diff FROM tweet t LEFT OUTER JOIN stock st
ON(t.datecol = st.datecol AND lower(t.cmpname) = lower(st.cmpname));
I have a table with one column data type as 'timestamp'. Whenever i try to do some queries on the table, even simple select statement i am getting errors.
example of a row in my column,
'2014-01-01 05:05:20.664592-08'
The statement i am trying,
'select * from mytable limit 10;'
the error i am getting is
'Failed with exception java.io.IOException:java.lang.NumberFormatException: For input string: "051-08000"'
Date functions in hive like TO_DATE are also not working.If i change the data type to string, i am able to extract the date part using substring. But i need to work with timestamp.
Has anyone faced this error before? Please let me know.
Hadoop is having trouble understanding '2014-01-01 05:05:20.664592-08' as a timestamp because of the "592-08" at the end. You should change the datatype to string, cut off the offending portion with a string function, then cast back to timestamp:
select cast(substring(time_stamp_field,1,23) as timestamp) from mytable
I am trying to do achieve this;
location/11.11
location/12.11
location/13.11
In order to do that , i have tried many things and couldn't make it happen.
Now i have an Udf hive function which returns me the location of s3 table, but i am facing with an error ;
ParseException line 1:0 cannot recognize input near 'LOCATION'
'datenow' '(' LOCATION datenow(); NoViableAltException(143#[])
This is my hive script , i have two external tables.
CREATE TEMPORARY FUNCTION datenow AS 'LocationUrlGenerator';
CREATE EXTERNAL TABLE IF NOT EXISTS s3( file Array<String>)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' COLLECTION ITEMS TERMINATED BY '\001' LINES TERMINATED BY '\n';
LOCATION datenow();
LOCATION accepts a string, not an UDF. The Language Manual si a bit unclear because it only specifies [LOCATION hdfs_path] and leaves hdfs_path undefined, but it can only be an URL location path, a string. In general UDFs are not acceptable in DDL context.
Build a script with any text tool of choice and run that script.
I managed it like that ,
INSERT INTO TABLE S3
PARTITION(time)
SELECT func(json),from_unixtime(unix_timestamp(),'yyyy-MM-dd') AS time FROM tracksTable;