Hive unable to read decimal value from hdfs - hadoop

My hive version is 0.13.
I have a file that contain decimal value and few other data types. This file is obtained after performing some Pig transformations. I created a Hive table on top of this HDFS file. When I try to do a select * from table_name, I find that the decimal values in the file are truncated into integer values. What could be the reason for this?
Below is my table:
CREATE TABLE FSTUDENT(
load_dte string COMMENT 'DATE/TIME OF FILE CREATION',
xyz DECIMAL,
student_id int
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\u0001'
LINES TERMINATED BY '\n'
STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION 'hdfs://clsuter1/tmp/neethu/part-m-00000';
The output for select * from table_name gives the decimal value for 1387.00000 as 1387.
Any help?
Thanks.

#Neethu: Altering table would not make any difference unless it is an external table.
As #K S Nidhin mentioned, As of Hive 0.13 users can specify scale and precision when creating tables with the DECIMAL datatype using a DECIMAL(precision, scale) syntax. If scale is not specified, it defaults to 0 (no fractional digits). If no precision is specified, it defaults to 10. You can find the same in hive docs
try dropping the table FSTUDENT and recreate the table with DECIMAL(precision, scale). Somthing like
CREATE TABLE FSTUDENT(
load_dte STRING,
xyz DECIMAL(10,5), -- in your case
student_id INT
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\u0001'
LINES TERMINATED BY '\n'
STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
or
truncate the table/ insert overwrite the data in to the table after altering the column datatype. Hope this helps !

The issue is because you haven't mentioned the precision.
DECIMAL with out precision will Defaults to decimal(10,0).
So you have to add precision to get the required value.

Related

How can I create a partitioned table that is semicolumn separated and has commas as decimal points?

I'm having problems whith this type of table:
manager; sales
charles; 100,1
ferdand; 212,6
aldalbert; 23,4
chuck; 41,6
I'm using the code bellow to create and define the partitioned table:
CREATE TABLE db.table
(
manager string,
sales string
)
partitioned by (file_type string)
row format delimited fields terminated by ';'
lines terminated by '\n'
tblproperties ("skip.header.line.count"="1");
Afterwards, I'm using a regex command to replace the commas by dots and then convert the sales field to a number datatype.
I wonder if there is a better solution than that.
Other than using Spark or Pig to clean the data as well as load the Hive table, then no, you'll need to replace and cast the sales column within HiveQL to get the format you want

Can we use TEXT FILE format for Hive Table with Snappy compression?

I have an hive external table in the HDFS and i am trying to create a hive managed table above it.i am using textfile format with snappy compression but i want to know how it helps the table.
CREATE TABLE standard_cd
(
last_update_dttm TIMESTAMP,
last_operation_type CHAR (1) ,
source_commit_dttm TIMESTAMP,
transaction_dttm TIMESTAMP ,
transaction_type CHAR (1)
)
PARTITIONED BY (process_dt DATE)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '|'
STORED AS TEXTFILE
TBLPROPERTIES ("orc.compress" = "SNAPPY");
Let me know if any issues in creating in this format.
As such their is no issue while creating.
but difference in properties:
Table created and stored as TEXTFILE:
Table created and stored as ORC:
although the size of both tables were same after loading some data.
also check documentation about ORC file format

How to create Hive table for special formated data

I have text files that i want to load into Hive table.
Format of the data is like below
Id|^|SegmId|^|geographyId|^|Sequence|^|Subtracted|^|FFAction|!|
4295875876|^|3|^|110170|^|1|^|False|^|I|!|
4295876137|^|2|^|110170|^|1|^|False|^|I|!|
4295876137|^|8|^|100219|^|1|^|False|^|I|!|
I want to create a table in Hive for this kind of data.
Can you please suggest how to create table for this?
This is what I have tried but getting null (also please suggest us the data type for the columns):
create table if not exists GeographicSegment
(
Id int,
SegId int,
geographyId int,
Sequence int,
Subtracted String,
FFAction String
) row format delimited fields terminated by '|!|' LINES TERMINATED BY '\n' ;
This has worked for me
row format SERDE 'org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe' WITH SERDEPROPERTIES ("field.delim"="|^|") tblproperties
It seems that your fields are terminated by '|^|' and your lines are terminated by '|!|\n'
Hive does not support multiple character as delimiter,
you can find the way to handle it here,
Solution
Regarding the data type what you are doing is correct except the first column ID. The value present is more than the range of INT. it can be BIGINT.

printing null values in hive while declaring the column as decimal

I declared column as decimal.The data looks like this.
4500.00
5321.00
532.00
column name : area Decimal(9,2)
but in Hive it shows like this:
NULL
NULL
If I declare the column as a string it works fine. But I need it in decimal only.
I believe this could be problem with the external table's delimiter mismatch.
I hope you might have configured different delimiter rather than the actual delimter exist in the file in case if you are using the external table.
Please try to find the actual delimiter and alter the table using the below command,
alter table <table_name> set SERDEPROPERTIES ('field.delim' = '<actual_delimiter>');
In create table statement
for example
CREATE TABLE figure(length DOUBLE, width DOUBLE, area DOUBLE) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE;
You replace '\t' by actual delimiter from your data file.

Share data between hive and hadoop streaming-api output

I've several hadoop streaming api programs and produce output with this outputformat:
"org.apache.hadoop.mapred.SequenceFileOutputFormat"
And the streaming api program can read the file with input format "org.apache.hadoop.mapred.SequenceFileAsTextInputFormat".
For the data in the output file looks like this.
val1-1,val1-2,val1-3
val2-1,val2-2,val2-3
val3-1,val3-2,val3-3
Now I want to read the output with hive. I created a table with this script:
CREATE EXTERNAL
TABLE IF NOT EXISTS table1
(
col1 int,
col2 string,
col3 int
)
PARTITIONED BY (year STRING,month STRING,day STRING,hour STRING)
ROW FORMAT DELIMITED
FIELDs TERMINATED BY '\t'
LINES TERMINATED BY '\n'
STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.SequenceFileAsTextInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.mapred.SequenceFileOutputFormat'
LOCATION '/hive/table1';
When I query data with query
select * from table1
The result will be
val1-2,val1-3
val2-2,val2-3
val3-2,val3-3
It seems the first column has been ignored. I think hive just use values as output not keys. Any ideas?
You are correct. One of the limitations of Hive right now is that ignores the keys from the Sequence file format. By right now, I am referring to Hive 0.7 but I believe it's a limitation of Hive 0.8 and Hive 0.9 as well.
To circumvent this, you might have to create a new input format for which the key is null and the value is the combination of your present key and value. Sorry, I know this was not the answer you were looking for!
It should be fields terminated by ','
instead of fields terminated by '\t', I think.

Resources