Ok so I know when using aggregate functions such as MAX, MIN, AVG and so fort, in a select statement. You need to use the GROUP BY function for all the selected columns that DON'T use the aggregate functions.EX
SELECT name, MAX(age)
FROM person
GROUP BY name
but my issue is, when I use my own functions for certain columns and I use an Aggregate function within my select statement. EX
SELECT f_fullname(name, surname) as fullname, max(age)
FROM person
Should i add the whole function as a part of the group by clause?
GROUP BY f_fullname(name, surname)
because at this moment i get the ORA-00979 not a GROUP BY expression error.
Thanks for your help!
PS. the select statements are just for explanation purposes**
You can either have the whole function or the columns which are the parameters.
select f_fullname(name , surname) full_name, max (age)
from person
group by name, surname;
or
select f_fullname(name , surname) full_name, max (age)
from person
group by f_fullname(name , surname);
Here is a sqlfiddle demo
Related
Let me explain the question.
I have two tables, which have 3 columns with same data tpyes. The 3 columns create a key/ID if you like, but the name of the columns are different in the tables.
Now I am creating queries with these 3 columns for both tables. I've managed to independently get these results
For example:
SELECT ID, FirstColumn, sum(SecondColumn)
FROM (SELECT ABC||DEF||GHI AS ID, FirstTable.*
FROM FirstTable
WHERE ThirdColumn = *1st condition*)
GROUP BY ID, FirstColumn
;
SELECT ID, SomeColumn, sum(AnotherColumn)
FROM (SELECT JKM||OPQ||RST AS ID, SecondTable.*
FROM SecondTable
WHERE AlsoSomeColumn = *2nd condition*)
GROUP BY ID, SomeColumn
;
So I make a very similar queries for two different tables. I know the results have a certain number of same rows with the ID attribute, the one I've just created in the queries. I need to check which rows in the result are not in the other query's result and vice versa.
Do I have to make temporary tables or views from the queries? Maybe join the two tables in a specific way and only run one query on them?
As a beginner I don't have any experience how to use results as an input for the next query. I'm interested what is the cleanest, most elegant way to do this.
No, you most probably don't need any "temporary" tables. WITH factoring clause would help.
Here's an example:
with
first_query as
(select id, first_column, ...
from (select ABC||DEF||GHI as id, ...)
),
second_query as
(select id, some_column, ...
from (select JKM||OPQ||RST as id, ...)
)
select id from first_query
minus
select id from second_query;
For another result you'd just switch the tables, e.g.
with ... <the same as above>
select id from second_query
minus
select id from first_query
I am using CDH-5.4.4 Cloudera Edition, I have a CSV file in HDFS location, My requirement is to perform Real time SQL queries on Hadoop Environement (OLTP).
So I decided to go with Impala, I have created MetaStore table to a CSV file, then execuing query in impala editor (within HUE application) .
When i am executing below query, i am getting error like
"AnalysisException: all DISTINCT aggregate functions need to have the
same set of parameters as count(DISTINCT City); deviating function:
count(DISTINCT Country)".
CSV File
OrderID,CustomerID,City,Country
Ord01,Cust01,Aachen,Germany
Ord02,Cust01,Albuquerque,USA
Ord03,Cust01,Aachen,Germany
Ord04,Cust02,Arhus,Denmark
Ord05,Cust02,Arhus,Denmark
Problamatic Query
Select CustomerID,Count(Distinct City),Count(Distinct Country) From CustomerOrders Group by CustomerID
Problem:
Unable to execute the Impala Query with More than one Distinct Values in an Query.. I have searched over internet they provide NDV() method as a workaround, But NDV method only returns approximate count of distinct values, I need Exact unique count for more than one fields.
Expectation:
What is the best way to do Exact unique count for more than one fields? Kindly modify the above query to work with Impala.
Note: This is not my original table, I have replicate for the forum question.
I've the same problem in Impala. Here is my workaround:
SELECT CustomerID
,sum(nr_of_cities)
,sum(nr_of_countries)
FROM (
SELECT CustomerID
,Count(DISTINCT City) AS nr_of_cities
,0 AS nr_of_countries
FROM CustomerOrders
GROUP BY CustomerID
UNION ALL
SELECT CustomerID
,0 AS nr_of_cities
,Count(DISTINCT Country) AS nr_of_countries
FROM CustomerOrders
GROUP BY CustomerID
) AS aa
GROUP BY CustomerID
I think this can be done cleaner (untested):
WITH
countries AS
(
SELECT CustomerID
,COUNT(DISTINCT City) AS nr_of_countries
FROM CustomerOrders
GROUP BY 1
)
,
cities AS
(
SELECT CustomerID
,COUNT(DISTINCT City) AS nr_of_cities
FROM CustomerOrders
GROUP BY 1
)
SELECT CustomerID
,nr_of_cities
,nr_of_countries
FROM cities INNER JOIN countries USING (CustomerID)
Can someone see where I am going wrong in the below query? I am getting the error message that the GROUP BY column doesn't exist, but it clearly does as I see that column name in the output when I don't use the GROUP BY.
SELECT
(SELECT customer_address.post_code FROM customer_address WHERE customer_address.address_type = 0 AND customer_address.customer_no = orders.customer_no) postcode, SUM(orders.order_no) orders
FROM
orders, customer_address
WHERE
orders.delivery_date = '27-MAY-15'
GROUP BY
postcode;
The answer is: You cannot use an alias name in GROUP BY.
So:
GROUP BY (SELECT customer_address.post_code ...);
Or:
select postcode, sum(order_no)
from
(
SELECT
(SELECT customer_address.post_code FROM customer_address WHERE customer_address.address_type = 0 AND customer_address.customer_no = orders.customer_no) postcode,
orders.order_no
FROM orders, customer_address
WHERE orders.delivery_date = '27-MAY-15'
)
GROUP BY postcode;
EDIT:
However, your query seems wrong. Why do you cross-join orders and customer_address? By mistake I guess. Use explicit joins (INNER JOIN customer_address ON ...), when using joins to avoid such errors. But here I guess you'd just have to remove , customer_address.
Then why do you add order numbers? That doesn't seem to make sense.
I need to create a view VIEW UOS_VU_STUDENT_AVERAGE, one of the column requries average GRADE, SQL:
CREATE VIEW UOS_VU_STUDENT_AVERAGE AS
SELECT STUDENT.FIRST_NAME, STUDENT.LAST_NAME, STUDENT_MODULE.GRADE
FROM STUDENT, STUDENT_MODULE
WHERE STUDENT_ID<120000001
How could I avg grade in this sql?
try
CREATE VIEW UOS_VU_STUDENT_AVERAGE AS
SELECT STUDENT.FIRST_NAME, STUDENT.LAST_NAME, avg(STUDENT_MODULE.GRADE)
FROM STUDENT, STUDENT_MODULE
WHERE STUDENT_ID<120000001
group by STUDENT.FIRST_NAME, STUDENT.LAST_NAME
as zerkms commented, there is no join condition, you probably need something like this:
CREATE VIEW UOS_VU_STUDENT_AVERAGE AS
SELECT STUDENT.FIRST_NAME, STUDENT.LAST_NAME, avg(STUDENT_MODULE.GRADE)
FROM STUDENT join STUDENT_MODULE on student_module.STUDENT_ID = student.id
WHERE STUDENT_ID<120000001
group by STUDENT.FIRST_NAME, STUDENT.LAST_NAME
(I'm just guessing that the FK is on student_module.STUDENT_ID = student.id)
You have to use "Group by" for any mathematical function like
AVG ( [ ALL | DISTINCT ] expression )
for example ..
SELECT id, AVG(salary) from tablename GROUP BY filedname
You'll need to do something like the following. I've made a few assumptions on the structure of the tables.
CREATE VIEW uos_vu_student_average AS
SELECT first_name, last_name, AVG(grade) avg_grade
FROM student, student_module
WHERE student.student_id = student_module.student_id
AND student.student_id < 120000001
GROUP BY first_name, last_name;
In your example, and as was pointed out by a previous poster, you don't have a join in your original example so it would just average all the grades together, regardless of the student_id. The result would show everyone with the same average grade.
When doing aggregate functions inside a view you also need to assign the resulting column an alias, so you can reference it in some manner when performing DML against the view. In this case I assigned it avg_grade.
I have a table something like this:
ID|Value
01|1
02|4
03|12
01|5
02|14
03|22
01|9
02|32
02|62
01|13
03|92
I want to know how much progress have each id made (from initial or minimal value)
so in sybase I can type:
select ID, (value-min(value)) from table group by id;
ID|Value
01|0
01|4
01|8
01|12
02|0
02|10
02|28
02|58
03|0
03|10
03|80
But monetdb does not support this (I am not sure may be cz it uses SQL'99).
Group by only gives one column or may be average of other values but not the desired result.
Are there any alternative to group by in monetdb?
You can achieve this with a self join. The idea is that you build a subselect that gives you the minimum value for each id, and then join that to the original table by id.
SELECT a.id, a.value-b.min_value
FROM "table" a INNER JOIN
(SELECT id, MIN(value) AS min_value FROM "table" GROUP BY id) AS b
ON a.id = b.id;