Hive Command Line - problems with backticks in column name - hadoop

When I try creating a table using beeline / hive command line for the following DDL :
CREATE EXTERNAL TABLE schema.table
(
`Week` string,
`Orders` string,
`Units` string
)
COMMENT 'This table was auto generated'
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
'separatorChar' = ',',
'quoteChar' = '\"',
'escapeChar' = '\\'
)
STORED AS TEXTFILE
LOCATION '/data/qa/ingest_id=1543338670'
TBLPROPERTIES ("skip.header.line.count"="1");
I get the following error
Error: Error while compiling statement: FAILED: ParseException line 3:0 character '▒' not supported here
line 3:1 character '▒' not supported here
line 3:2 character '▒' not supported here (state=42000,code=40000)
Has anyone faced this issue before? This DDL executes without issues on a GUI client.

Issue was related UTF 8 encoding. Removed unicode characters from the shell.
tr -d '\200-\277' | tr -d '\300-\377'

Related

Export hql output to csv in beeline

I am trying to export my hql output to csv in beeline using below command :
beeline -u "jdbc:hive2://****/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2"?tez.queue.name=devices-jobs --outputformat=csv2 -e "use schema_name; select * from table_name where open_time_new>= '2020-07-13' and open_time_new < '2020-07-22'" > filename.csv
The problem is that some column values in the table contains commas which pushes the data of same column to the next column value.
For eg:
| abcd | as per data,outage fault,xxxx.
| xyz |as per the source,ghfg,hjhjg.
The above data will get saved as 4 column instead of 2.
Need help!
Try the approach with local directory:
insert overwrite local directory '/tmp/local_csv_report'
row format delimited fields terminated by "," escaped by '\\'
select *
from table_name
where open_time_new >= '2020-07-13'
and open_time_new < '2020-07-22'
This will create several csv files under your local /tmp/local_csv_report directory, so using simple cat after that will merge the results into a single file.

sqlldr stream record format on commandline

I am loading a delimited file using sqlldr. I have kept file format/table details in the ctl file and pass other parameters on the command line.
sqlldr control=sp.ctl data=data.20170502.txt SKIP=1 userid=xyz#db/pwd log=sp.log bad=sp.bad
sp.ctl
LOAD DATA
TRUNCATE
INTO TABLE "T_DATA"
TRUNCATE
FIELDS TERMINATED BY '|'
TRAILING NULLCOLS
(
C_1 CHAR(2000),
C_2 CHAR(2000),
C_3 CHAR(2000)
)
I now need to use a stream record format on this data file.
infile 'example3.dat' "str '|\n'"
However, I am not using the infile syntax.
So I tried using
sqlldr control=sp.ctl data=data.20170502.txt "str '!\n'" SKIP=1
userid=xyz#db/pwd log=sp.log bad=sp.bad
It gives an error:
LRM-00112: multiple values not allowed for parameter 'data'
How do I pass the record delimiter on the command line?

Hive table creation error through Bash Shell

Can anyone give me why I am getting error while creating partitioed table from bash shell.
[cloudera#localhost ~]$ hive -e "create table peoplecountry (
name1 string,
name2 string,
salary int,
country string
)
partitioned by (country string)
row format delimited
column terminated by '\n'";
Logging initialized using configuration in jar:file:/usr/lib/hive/lib/hive-common-0.10.0-cdh4.7.0.jar!/hive-log4j.properties
Hive history file=/tmp/cloudera/hive_job_log_0fdf7083-8ab4-499f-8048-a85f162d1357_376056456.txt
FAILED: ParseException line 8:0 missing EOF at 'column' near 'delimited'
If you meant newline at end of each row of your data then you need to use:
line terminated by '\n'
instead of column terminated by ,
In case you meant each column in the row to separated by a delimiter , then specify as
fields terminated by '\n'
refer :
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL

hive load data:how to specify file column separator and dynamic partition columns?

well I had some question on loading mysql data into hive2, and don't know how to specify the separator, I tried for serval times but got nothing.
Here below is the hive table,id is the partition column,
0: jdbc:hive2://localhost/> desc test;
+-----------+------------+----------+
| col_name | data_type | comment |
+-----------+------------+----------+
| a | string | |
| id | int | |
+-----------+------------+----------+
When i execute
load data local inpath 'file:///root/test' into table test partition (id=1);
it says:
Invalid path ''file:///root/test'': No files matching path file
but it do exists.
I wish to dynamic partitioned by the specified file,so i add the very column into the file like this:
root#<namenode|~>:#cat /root/test
a,1
b,2
but it also failed,the docs say nothing about this,i guess it doesn't support right now.
dose anyone got some idea in it? any help will be appreciated!
If you want to specify column sperators it uses the command;
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
Replace the ',' with your separator
Also if you want to partition a Hive table you specify the column which you want to terminate on using;
CREATE TABLE Foo (bar int )
PARTITIONED BY (testpartition string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','

LOAD DATA query error

What is the problem with this line
$load ="LOAD DATA INFILE $inputFile INTO TABLE $tableName FIELDS TERMINATED BY ',' LINES TERMINATED BY '\r\n' IGNORE 1 LINES";
echo $load;
mysql_query($load);
The echo result is;
LOAD DATA INFILE appendpb.csv INTO TABLE appendpb_csv FIELDS TERMINATED BY ',' LINES TERMINATED BY ' ' IGNORE 1 LINES
The error is;
You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'appendpb.csv INTO TABLE appendpb_csv FIELDS TERMINATED BY ',' LINES TERMINATED B' at line 1
According to the MYSQL LOAD DATA Reference it should have single quotes around the input file:
$load ="LOAD DATA INFILE '$inputFile' INTO TABLE $tableName FIELDS TERMINATED BY ',' LINES TERMINATED BY '\r\n' IGNORE 1 LINES";
Eventually looking likes this
LOAD DATA INFILE 'appendpb.csv' INTO TABLE appendpb_csv FIELDS TERMINATED BY ',' LINES TERMINATED BY ' ' IGNORE 1 LINES
Assuming the path of the file is correct.

Resources