Identifying character delimiter in Hive - hadoop

Below is my data having with delimiter thorn þ
1þNaveenþ"Bangalore ,"Karnataka"þ
2þNaveenþ"Srikanth ^ Karnatakaþ562114
Create table statement is below
CREATE External TABLE adh_dev.delimiter_test (Number INT, Name STRING,
Address string , Pincode int)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\-61'
STORED AS TEXTFILE
LOCATION '/xyz/test_delimiter';
Tried below approaches nothing worked
1) Followed below link
Thorn character delimiter is not recognized in Hive
2)Tried to put '-2'
3)Followed below link
http://www.theasciicode.com.ar/extended-ascii-code/capital-letter-thorn-ascii-code-232.html
4)Tried to put '\xFE'
Please help me in resolving the issue
I am using cloudera CDH5.11.1 and hive 1.1.0
Please help me in resolving the issue struggling from past 3 days

Related

Using SQL reserved words in Hive when creating external temporary table

I need to create an external hive table from hdfs location where one column in files has reserved name (end).
When running the script I get the error:
"cannot recognize input near 'end' 'STRUCT' '<' in column specification"
I found 2 solutions.
The first one is to set hive.support.sql11.reserved.keywords=false, but this option has been removed.
https://issues.apache.org/jira/browse/HIVE-14872
The second solution is to use quoted identifiers (column).
But in this case I get the error:
"org.apache.hadoop.hive.serde2.SerDeException: org.codehaus.jackson.JsonParseException: Unexpected character ('c' (code 99)): was expecting comma to separate OBJECT entries"
This is my code for table creation:
CREATE TEMPORARY EXTERNAL TABLE ${tmp_db}.${tmp_table}
(
id STRING,
email STRUCT<string:STRING>,
start STRUCT<long:BIGINT>,
end STRUCT<long:BIGINT>
)
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
LOCATION '${input_dir}';
It's not possible to rename the column.
Does anybody know the solution for this problem? Or maybe any ideas?
Thanks a lot in advance!
can you try below.
hive> set hive.support.quoted.identifiers=column;
hive> create temporary table sp_char ( `#` int, `end` string);
OK
Time taken: 0.123 seconds
OK
Time taken: 0.362 seconds
hive>
When you set hive property hive.support.quoted.identifiers=column all the values within back ticks are treated as literals.
Other value for above property is none , when it is set to none you can use regex to evaluate the column or expression value.
Hope this helps

Select statement in hive return some columns with null value

I have seen this type of questions were asked many times, but those solutions not worked for me. I created a external hive table, since i had the data is from map-only job output. Then, by load command i given the path for the specific file. It showed ok. But when i give select * from table command it returns some column with null values. Each command i have executed is in the error pic.
My delimiter in file is ||, so i mentioned the same in create table command too.
Here is my input file pic file pic. And here is the error pic
. I have also tried a normal table instead of external table. That too showed the same error. I also tried by mentioning delimiter as //|| and also \|\|. But none worked.
The problem that you are facing is related to multiple characters as FIELD delimiter.
According to documentation FIELD delimiter should be a CHAR
row_format
: DELIMITED [FIELDS TERMINATED BY char [ESCAPED BY char]] [COLLECTION ITEMS TERMINATED BY char]
[MAP KEYS TERMINATED BY char] [LINES TERMINATED BY char]
[NULL DEFINED AS char] -- (Note: Available in Hive 0.13 and later)
You need to change your data to have only single char field delimiter.
If you can not do that then the other approach is to use stage table with single field. Load your data to that table and then in your actual target table, split the column in stage table by || delimiter and then insert. You need to make sure that field counts are consistent in the data otherwise your final output will be off.
Reference:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-CreateTableCreate/Drop/TruncateTable

Table or view not found-convert hive table to spark dataframe

I am trying to do the following operation:
import hiveContext.implicits._
val productDF=hivecontext.sql("select * from productstorehtable2")
println(productDF.show())
The error I am getting is
org.apache.spark.sql.AnalysisException: Table or view not found:
productstorehtable2; line 1 pos 14
I am not sure why that is occurring.
I have used this in spark configuration
set("spark.sql.warehouse.dir", "hdfs://quickstart.cloudera:8020/user/hive/warehouse")
and the location when I do describe formatted productstorehtable2
hdfs://quickstart.cloudera:8020/user/hive/warehouse/productstorehtable2
I have used this code for creating the table
create external table if not exists productstorehtable2
(
device string,
date string,
word string,
count int
)
row format delimited fields terminated by ','
location 'hdfs://quickstart.cloudera:8020/user/cloudera/hadoop/hive/warehouse/VerizonProduct2';
I use sbt (with spark dependencies) to run application. My OS is CentOS and I have spark 2.0
Could someone help me out in spotting where I am going wrong?
edit:
when I perform println(hivecontext.sql("show tables")) it just outputs a blank line
Thanks

Data (Single quotes and Doube Quotes) Mismatch in Hive

While loading the file from mainframe into Hadoop in ORC format,some of the data loaded with Single Quotes(') and remaining with Double quotes(").But the complete source file is in Single Quote (').
To specify custom delimiters used Hive Cobol Serde.
Example:
Source data:
First_Name Last_name Address
Rev 'Har' O'Amy 4031 'B' Ave
Loaded into Hadoop as,some data with correct format(') and some with double quotes(") as below:
First_Name Last_name Address
Rev "Har" O"Amy 4031 "B" Ave
what could be the issue and how to solve this?
one possible issue might be delimiter given while your table creation
so try
ROW FORMAT SERDE ‘org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe’ WITH SERDEPROPERTIES(“serialization.encoding”=’UTF-8′); while creating hive table and then load the data.
also try using udf given in this link to remove all special characters if you want your data clean https://github.com/ogrodnek/csv-serde

LINES TERMINATED BY only supports newline '\n' right now

I have files where the column is delimited by char(30) and the lines are delimited by char(31). I'm using these delimiters mainly because the columns may contain newline (\n), so the default line delimiter for hive is not useful for us.
I have tried to change the line delimiter in hive but get the error below:
LINES TERMINATED BY only supports newline '\n' right now.
Any suggestion?
Write custom SerDe may work?
is there any plan to enhance this functionality in hive in new releases?
thanks
Not sure if this helps, or is the best answer, but when faced with this issue, what we ended up doing is setting the 'textinputformat.record.delimiter' Map/Reduce java property to the value being used. In our case it was a string "{EOL}", but could be any unique string for all practical purposes.
We set this in our beeline shell which allowed us to pull back the fields correctly. It should be noted that once we did this, we converted the data to Avro as fast as possible so we didn't need to explain to every user, and the user's baby brother, to set the {EOL} line delimiter.
set textinputformat.record.delimiter={EOL};
Here is the full example.
#example CSV data (fields broken by '^' and end of lines broken by the String '{EOL}'
ID^TEXT
11111^Some THings WIth
New Lines in THem{EOL}11112^Some Other THings..,?{EOL}
111113^Some crazy thin
gs
just crazy{EOL}11114^And Some Normal THings.
#here is the CSV table we laid on top of the data
CREATE EXTERNAL TABLE CRAZY_DATA_CSV
(
ID STRING,
TEXT STRING
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\136'
STORED AS TEXTFILE
LOCATION '/archive/CRAZY_DATA_CSV'
TBLPROPERTIES('skip.header.line.count'='1');
#here is the Avro table which we'll migrate into below.
CREATE EXTERNAL TABLE CRAZY_DATA_AVRO
(
ID STRING,
TEXT STRING
)
STORED AS AVRO
LOCATION '/archive/CRAZY_DATA_AVRO'
TBLPROPERTIES ('avro.schema.url'='hdfs://nameservice/archive/avro_schemas/CRAZY_DATA.avsc');
#And finally, the magic is here. We set the custom delimiter and import into our Avro table.
set textinputformat.record.delimiter={EOL};
INSERT INTO TABLE CRAZY_DATA_AVRO SELECT * from CRAZY_DATA_CSV;
I have worked it out by using the option during the extract --hive-delims-replacement ' ' in sqoop so the characters \n \001 \r are removed from the columns.

Resources