How to use aggregate functions in Hive on Group by columns - hadoop

When I try to use a inbuilt UDF function or my own UDF function on the GroupBy columns as below in hive I seem to be getting error
select col1, col2 from xyz group by my_func(col1), col2
It keeps complaining column –col1 not found in group by expression.

When you apply a function to a column, it is not longer called the same thing. You should name it explicitly using the as keyword.
select group1, group2 from xyz group by my_func(col1) as group1, col2 as group2;
Also, if you're only selecting the columns that you're grouping by, not the actual grouped data, maybe distinct would be more appropriate than group by?

The call to the aggregate function is in the wrong place. It should be made as follows:
Select my_func(col1),col2 from xyz group by col1,col2

select col1, col2 from xyz group by my_func(col1) as col1, col2
The basic is that your GROUP BY needs to have all the cols that you have mentioned in SELECT clause.

Related

How do I exclude certain columns within a transformation?

When using Upsolver SQLake, if my source table has 100's of columns, and I want to include most of them in a transformation, but exclude a few, can I do that without having to explicitly map every column in the transformation SQL?
For example, if my source table has 5 columns, (col1, col2, col3, col4, col5), and in my transformation I do not want to include col3. I could use the following SQL:
SELECT col1, col2, col4, col5 FROM sourcetable
However, if my source table has 1000 columns, I'd rather not have to type out 999 columns if I don't have to.
I was looking for an option to generate SQL, or some option to exclude certain columns from a transformation.
SQLake supports an EXCEPT parameter in the transformation job definition. The transformation SQL will be evaluated, however columns in the EXCEPT reference will be excluded in the target table.
CREATE JOB insert_all_columns_except_col3
START_FROM = NOW
ADD_MISSING_COLUMNS = TRUE
RUN_INTERVAL = 1 MINUTE
AS INSERT INTO target_table MAP_COLUMNS_BY_NAME EXCEPT col3
SELECT *
FROM source_table
WHERE $commit_time BETWEEN RUN_START_TIME() and RUN_END_TIME();
In this case, all columns from "source_table" will be written into "target_table" except for col3.

Print column2 from row with max(column1) without including column2 in group by clause

I know it is a silly question and may be already answered somewhere, please guide me to the link if it is.
I want to print a column which is not included in group by clause. Oracle says that it should be included in group by expression, but I want value to be from the same row from which max() value for the other column was selected.
For example: if I have a table with following columns:
Employee_Name, Action_code, Action_Name
I want to see the name of action with maximum action_code for each employee, also I cannot use subquery in the condition.
I want some thing like this:
select employee_name, max(action_code), action_name --for max code
from emp_table
group by employee_name
This action_name in select statement is causing problem, if I add action_name in group by clause then it will show action name for each action for each employee, which will make the query meaningless.
Thanks for support
You can use a keep .. last pattern:
select employee_name,
max(action_code) as action_code,
max(action_name) keep (dense_rank last order by action_code) as action_name
from emp_table
group by employee_name
The documentation explains this more fully under the sister function first().

How to only select existing values from oracle?

I have a table with a massive number of columns. So many, that when I do SELECT * I can't even see any values because all the columns fill up the screen. I'd like to do something like this:
SELECT * FROM my_table WHERE NAME LIKE '%unique name%' AND <THIS COLUMN> IS NOT NULL
Is this possible? Note: VALUE is not a column.
There are so many questions on SO that ask this same question, but they have some bizarre twist, and the actual question is not answered.
I've tried:
SELECT * FROM my_table WHERE NAME LIKE '%unique name%' AND VALUE NOT NULL
*
Invalid relational operator
SELECT * FROM my_table WHERE NAME LIKE '%unique name%' AND VALUE <> ''
*
'VALUE': invalid identifier
SELECT * FROM my_table WHERE NAME LIKE '%unique name%' AND COLUMN NOT NULL
*
Missing Expression
Bonus Questions:
Is there any way to force Oracle to only show one output screen at a time?
Is there a variable to use in the WHERE clause that relates to the current column? Such as: WHERE this.column = '1', where it would check each column to match that expression?
Is there any way to get back your last command in Oracle? (I have to remote into a Linux box running Oracle - it's all command line - can't even copy/paste, so I have to type every command by hand, with a wonky connection, so it's taking an extremely long time to debug this stuff)
If you are trying to find all the non null column values for a particular record you could try an unpivot provided all the columns you are unpivoting have the same data type:
SELECT *
FROM (select * from my_table where name like '%unique value%')
UNPIVOT [include nulls] (col_value FOR col_name IN (col1, col2, ..., coln))
with the above code null values will be excluded unless you include the optional include nulls statement, also you will need to explicitly list each column you want unpivoted.
If they don't all have the same data type, you can use a variation that doesn't necessarily prune away all the null values:
select *
from (select * from my_table where name like '%unique value%')
unpivot ((str_val, num_val, date_val)
for col_name in ((cola, col1, date1)
,(colb, col2, date2)
,(colc, col3, date1)));
You can have a fairly large set of column groups, though here I'm showing just three, one for each major data type, with the IN list you need to have a column listed for each column in your column group, though you can reuse columns as shown by the date_val column where I've used date1 twice. As an alternative to reusing an existing column, you could use a dummy column with a null value:
select *
from (select t1.*, null dummy from my_table t1 where name like '%unique value%')
unpivot ((str_val, num_val, date_val)
for col_name in ((dummy, col1, date1)
,(colb, dummy, date2)
,(colc, col3, dummy)));
Have tried this?
SELECT * FROM my_table WHERE NAME LIKE '%unique name%' AND value IS NOT NULL;
Oracle / PLSQL: IS NOT NULL Condition
For row number:
SELECT field1, field2, ROW_NUMBER() OVER (PARTITION BY unique_field) R WHERE R=1;
Usually in Linux consoles you can use arrow up&down to repeat the last sentence.

Oracle: Selecting * and aggregate column

Is it possible to select fields using the method below?
SELECT *, count(FIELD) FROM TABLE GROUP BY TABLE
I get the following error
ORA-00923: FROM keyword not found where expected
00923. 00000 - "FROM keyword not found where expected"
*Cause:
*Action:
Error at Line: 1 Column: 9
Is it a syntax error or do you have to explicitly define each column rather than using *?
You can't use * and other columns. If you use an alias, then you can:
SELECT t.*
, count(FIELD)
FROM TABLE t
Also, your GROUP BY TABLE is wrong. You can't group by the table name, you must specify some columns, like this:
SELECT t.customer
, count(FIELD)
FROM TABLE t
GROUP BY t.customer
The columns that are selected in the field should be
an expression used as one of the group by criteria , or
an aggregate function , or
a literal value
For this, you need to indicate the fields you needed and should fit in the following criteria mentioned above.
SELECT FIELD1,FIELD2, COUNT(*) FROM TABLE1 GROUP BY FIELD1, FIELD2
If you insist to use the logic of your query, the use of subquery should be helpful.
For example,
SELECT * FROM TABLE1 T1 INNER JOIN (SELECT FIELD1, COUNT(FIELD1) AS [CountOfFIELD1] FROM TABLE1 T2 GROUP BY FIELD1)T3 ON T1.FIELD1=T3.FIELD1
Instead of * you need to give the column names:
SELECT a, b, COUNT(FIELD)
FROM TABLE
GROUP BY a, b;

Can i use the column in order by clasue

I have specifiec requirement .Actually this is my query. here amount is a column in my table.but i did not mention the amount column in select statement.here can i use this column in oreder by clause.
SELECT stud_name, stud_roll, stud_prg
FROM programcl
ORDER BY 3, amount, 1;
Yes, you can mix both positional and named assignments in your ORDER BY clause.
The positional assignments must appear in your SELECT list. The named assignments do not have to.
can i use this column in oreder by clause.
Yes of course you can use a different column in order by clause that wasn't selected from your select statement.
For example
select col1 from tab1
order by col2;
by this way you get results from col1 which will be displayed on order of col2.
Its Worth trying

Resources