When I run the hive query bellow, I receive the following error: Error while compiling statement:
FAILED: SemanticException [Error 10025]: line 8:13 Expression not in GROUP BY key '50000'
Hive Query:
SELECT
202106 as ANOMES,
count(wrin_agctd) as QTDETransacoes,
tipo_transacao,
cod_cate_cont,
wrin_nterm,
case
when wrin_valor<50000 then '<500'
when wrin_valor<100000 then '<1000'
when wrin_valor<150000 then '<1500'
when wrin_valor<200000 then '<2000'
end as test
FROM
ghp00468.raultav_saque_conta_salario_tecban_202106
WHERE
tipo_transacao="SAQUE_TECBAN" and
tipo_transacao="CONSULTA_TECBAN"
GROUP BY
tipo_transacao,
cod_cate_cont,
wrin_nterm;
Where and what is the problem?
The issue is your case statement. The column test is not in the group by statement. You should add your case statement to the group by, or handle it external to the groupby. Here's a similar question that someone answered.
I am trying to execute below query in vertica:
select
case when count(*) = 0 then #{aaa} || #{bbb}
else
trim(cast(max(ccc)+1 as intger))
end
as ccc
from apple where aaa = #{aaa}
Query is getting expected results in tera but in Vertica Getting error.
How can I do this?
Check the Vertica docu:
https://www.vertica.com/docs/10.0.x/HTML/Content/Authoring/SQLReferenceManual/Functions/String/BTRIM.htm
BTRIM() removes the blank characters (by default, if not specified as the second parameter) from a passed string. You can't BTRIM() a number.
Why would you want to trim an integer, in trim(cast(max(ccc)+1 as int**e**ger)) ? It makes no sense at all!. Teradata should complain, too.
I have 2 hadoop clusters, One has hive-0.10.0 installed and another has hive-1.1.0 version installed.
I am able to run below query in hive-1.1.0 which gives date before 30 days from present date
select date_sub(from_unixtime(floor(unix_timestamp()/(60*24*24))*60*24*24), 30)
But, the Same query is giving syntax error in hive-0.10.0
ok failed: parseexception line 1:79 mismatched input '' expecting from near ')' in from clause
1.
Way too complected.
This will get you the same result:
select date_sub(from_unixtime(unix_timestamp()),30)
2.
Queries without FROM clause are only supported from hive 0.13
https://issues.apache.org/jira/browse/HIVE-178
Create s table with a single row (similar to Oracle dual) and use it as source
I am trying to remove the hard coding from Hive script. For that I have created a hql file(src_sys_cd_param.hql).
Setting the source system value thru parameter, below the of param file
hive -f /data/data01/dev/edl/md/sptfr/landing/src_sys_cd_param.hql;
param file is having command set src_sys_cd = 'M09';
After running the below script:
INSERT INTO TABLE SPTFR_CORE.M09_PRTY SELECT C.EDW_SK,A.PRTY_TYPE_HIER_ID,
A.PRTY_NUM,A.PRTY_DESC,A.PRTY_DESC,'N',${hiveconf:src_sys_cd},
A.DAI_UPDT_DTTM,A.DAI_CRT_DTTM
FROM SPTFR_STG.M09_PRTY_VIEW_STG A JOIN SPTFR_STG.BKEY_PRTY_STG C
ON ( CONCAT(A.PRTY_TYPE_LVL_1_CD,'|^',A.PRTY_NUM ,'|^',A.SRC_SYS_CD)= C.SRC_CMBN);
Receing the error:
Error while compiling statement: FAILED: ParseException line 1:113 cannot recognize input near '$' '{' 'hiveconf' in selection target
Hive documentation lacking again:
I'd like to write the results of a query to a local file as well as the names of the columns.
Does Hive support this?
Insert overwrite local directory 'tmp/blah.blah' select * from table_name;
Also, separate question: Is StackOverflow the best place to get Hive Help? #Nija, has been very helpful, but I don't to keep bothering them...
Try
set hive.cli.print.header=true;
Yes you can. Put the set hive.cli.print.header=true; in a .hiverc file in your main directory or any of the other hive user properties files.
Vague Warning: be careful, since this has crashed queries of mine in the past (but I can't remember the reason).
Indeed, #nija's answer is correct - at least as far as I know. There isn't any way to write the column names when doing an insert overwrite into [local] directory ... (whether you use local or not).
With regards to the crashes described by #user1735861, there is a known bug in hive 0.7.1 (fixed in 0.8.0) that, after doing set hive.cli.print.header=true;, causes a NullPointerException for any HQL command/query that produces no output. For example:
$ hive -S
hive> use default;
hive> set hive.cli.print.header=true;
hive> use default;
Exception in thread "main" java.lang.NullPointerException
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:222)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:287)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:517)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at org.apache.hadoop.util.RunJar.main(RunJar.java:197)
Whereas this is fine:
$ hive -S
hive> set hive.cli.print.header=true;
hive> select * from dual;
c
c
hive>
Non-HQL commands are fine though (set,dfs !, etc...)
More info here: https://issues.apache.org/jira/browse/HIVE-2334
Hive does support writing to the local directory. You syntax looks right for it as well.
Check out the docs on SELECTS and FILTERS for additional information.
I don't think Hive has a way to write the names of the columns to a file for the query you're running . . . I can't say for sure it doesn't, but I do not know of a way.
I think the only place better than SO for Hive questions would be the mailing list.
I ran into this problem today and was able to get what I needed by doing a UNION ALL between the original query and a new dummy query that creates the header row. I added a sort column on each section and set the header to 0 and the data to a 1 so I could sort by that field and ensure the header row came out on top.
create table new_table as
select
field1,
field2,
field3
from
(
select
0 as sort_col, --header row gets lowest number
'field1_name' as field1,
'field2_name' as field2,
'field3_name' as field3
from
some_small_table --table needs at least 1 row
limit 1 --only need 1 header row
union all
select
1 as sort_col, --original query goes here
field1,
field2,
field3
from
main_table
) a
order by
sort_col --make sure header row is first
It's a little bulky, but at least you can get what you need with a single query.
Hope this helps!
Not a great solution, but here is what I do:
create table test_dat
ROW FORMAT DELIMITED FIELDS TERMINATED BY "\t" STORED AS
INPUTFORMAT "com.hadoop.mapred.DeprecatedLzoTextInputFormat"
OUTPUTFORMAT "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat"
LOCATION '/tmp/test_dat' as select * from YOUR_TABLE;
hive -e 'set hive.cli.print.header=true;select * from YOUR_TABLE limit 0' > /tmp/test_dat/header.txt
cat header.txt 000* > all.dat
Here's my take on it. Note, i'm not very well versed in bash, so improvements suggestions welcome :)
#!/usr/bin/env bash
# works like this:
# ./get_data.sh database.table > data.csv
INPUT=$1
TABLE=${INPUT##*.}
DB=${INPUT%.*}
HEADER=`hive -e "
set hive.cli.print.header=true;
use $DB;
INSERT OVERWRITE LOCAL DIRECTORY '$TABLE'
row format delimited
fields terminated by ','
SELECT * FROM $TABLE;"`
HEADER_WITHOUT_TABLE_NAME=${HEADER//$TABLE./}
echo ${HEADER_WITHOUT_TABLE_NAME//[[:space:]]/,}
cat $TABLE/*