Group by specified column in PostgreSQL - ruby

Maybe, this question is a little stupid, but I'm confused.
How to group records by specified column ? :)
Item.group(:category_id)
does't works...
It says:
ActiveRecord::StatementInvalid: PGError: ERROR: column "items.id" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: SELECT "items".* FROM "items" GROUP BY category_id
What kind of aggregate function should i use?
Please, could you provide a simple example.

You will have to define, how to group values that share the same category_id. Concatenate them? Calculate a sum?
To create comma-separated lists of values your statement could look like this:
SELECT category_id
,string_agg(col1, ', ') AS col1_list
,string_agg(col2, ', ') AS col2_list
FROM items
GROUP BY category_id
You need Postgres 9.0 or later for string_agg(col1, ', ').
In older versions you can substitute with array_to_string(array_agg(col1), ', '). More aggregate functions here.
To aggregate values in PostgreSQL is the clearly superior approach as opposed to aggregating values in the client. Postgres is very fast at this and it reduces (network) traffic.

You can use sum, avg, count or any other aggregate function. More on this topic you can find here.
But it seems that you don't really need to use SQL grouping.
Try to fetch all records and then use Array#collect function to group Items by category_id

Grouping in SQL means that the server groups one or more records from the database table into one resulting row. So, if you for example group by category_id, you might have several records matching the given category, so you can't expect the database to return all columns from the table (that's what SELECT * actually does).
Instead, when you use GROUP BY, you can SELECT only:
columns you have grouped by, and/or
aggregate functions which are performed on all the records belonging to a resulting group
Depending on what you exactly need, modify your .select accordingly.

Related

How to make a dynamic select based on a result set from previous step on Pentaho Kettle?

I want to execute a select statement based on a result set from a previous step, something like this:
select column from table where column in (previous step);
Basically this step (filter rows) will split a group of ids based on a condition. I want to make a select with those who tested false but i don't know how to select only those. The table in question, which I want to select, it's very big and it is very expensive to select all records and join with the result set, so I wish to select just the group that I need, is this even possible ?
https://i.stack.imgur.com/Xu1qt.png
Ok, let me try to be more specific.
Basically I Have 3 steps as my print shows.
First step is a table input, which I select from a table.
Second step is a database lookup, which I look on other table to get some fields that I want.
And third step it's a filter rows, where I kinda make a if else statement.
After my third step (filter rows) I have 2 streams: True or False.
Each stream returns me a group of ids and other fields too but it's not that important here, I guess.
I want to make a select statement based on those ids returned from previous step (Filter rows 3° step).
Basically the behaviour that I want its similar to this query:
select *
from table
where id in ("previous step");
Where table will always be the same table, so I don't think this will be a problem or something.
And "previous step" means all ids returned after the 3° step (filter rows).
What i am doing right now is: I have another table input on the other side which I make a merge join with this result set(from 3° step). But I have to make a select of the entire table and then, join with my result set, what is very expensive, and I'm wondering if i can get the same result, but with more performance.
I don't know if I am being clear enough, but I apologize right now because english it's not my main language, but I hope you guys can understand me now, thanks.
You can use three steps to achive this.
First, use a Memory group by step to group ids as a field.The aggregate tyoe should beConcatenate strings separated by ,
Second,use a User defined java expression step to generate a new field contains the SQL we need.The expression may like"SELECT id,created FROM test WHERE order_id IN ("+ ids +")" and ids is the group result from last step.
At last,we can use a Dynamic SQL row step to look up datas by the specified SQL.

ORA-00979 not a Group By function error

Iam trying to select 2 values from a Table, Employee emp_name, emp_location grouping by emp_location, iam aware that the columns which are in group by function needs to be in select clause, but i would like to know whether is there any other way to get these value in a single query.
My intention is to select only one employee per location based on age.
sample query
select emp_name,emp_location
from Employee
where emp_age=25
group by emp_location
please help in this regard.
Thanks a lot for all the guys who have responded for this question. I will try to learn these windows functions as these are very handy.
The reason why this works in MySQL and not in Oracle, is because in Oracle, as well most other databases, you either need to specify a field (or expression) in the group by clause, or it has to be an aggregation which combines the values of all values in the group into a single one. For instance, this would work:
select max(emp_name),emp_location
from Employee
where emp_age=25
group by emp_location
However, it's may not the best solution. It will work if you want just the name, but you'll get into trouble when you want to have multiple fields for an employee. In that case max won't do the trick. In the query below, you might get a first name that doesn't match the last name.
select max(emp_firstname), max(emp_lastname), emp_location
from Employee
where emp_age=25
group by emp_location
On solution for this, is using a window function (analytical function). With those, you can generate a value for each record, without immediately reducing the number of records. For instance, with a windowed max function, you could select the max age for people named John, and display that value next to every John in the result, even if they don't have that age.
Some functions, like rank, dense_rank and row_number can be used to generate a number for each employee, which you can then use to filter by. In the example below, I created such a counter per location (partition by), and ordered by, in this case name and id. You can specify other fields as well, for instance if you want one name per age per location, you specify both age and location in partition by. If you want the oldest employee of each location, you can remove where emp_age=25 and order by emp_age desc instead.
select
*
from
(select
emp_name, emp_location,
dense_rank() over (partition by emp_location order by emp_name, emp_id) as emp_rank
from Employee
where emp_age=25)
where
emp_rank = 1
ORA-00979 not a Group By function error
Only aggregate functions and columns specified in the GROUP BY clause are allowed in the SELECT clause.
In that regard, Oracle follows the SQL standard closely. But, as you noticed in your comment, some other RDBMS are less strict than Oracle regarding that point. For example, to quote MySQL's documentation (emphasis mine):
MySQL extends the use of GROUP BY so that the select list can refer to nonaggregated columns not named in the GROUP BY clause. [...]
However, this is useful primarily when all values in each nonaggregated column not named in the GROUP BY are the same for each group. The server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate.
So, in the recommended use case, adding the extra columns to the GROUP BY clause will lead to the same result.
select emp_name,emp_location
-- ^^^^^^^^
-- this is *not* part of the ̀`GROUP BY` clause
from Employee
where emp_state=25
group by emp_location
Maybe are you looking for:
...
group by emp_location, emp_name
select emp_name,emp_location
from Employee
where emp_age=25
group by emp_name,emp_location
or
select max(emp_name) emp_name,emp_location
from Employee
where emp_age=25
group by emp_location

Select distinct results, when using with() operator in Yii

How can I select only distinct records, from relational table, when using with() operator in Yii?
I'm getting my models (records) like that:
$probe = Probes::model()->with(array
(
'user',
'results',
'results.answer',
'survey',
'survey.questions',
'survey.questions.question',
'survey.questions.question.answers',
'manager'
))->findByPk($id);
I want to make sure, that survey.questions relation returns only distinct records. But it seems, that I don't see any way to achieve this (or I'm blind / not educated enough).
When giving relational table name / alias as array:
'results.question'=>array('alias'=>'results_question'),
the distinct key is not among those, that can be used in such array (as modifier).
I tried very ugly, bumpy way of changing select from default * to DISTINCT *:
'survey.questions'=>array('select'=>'distinct'),
But this has (of course?) failed:
Active record "SurveysQuestions" is trying to select an invalid column "distinct". Note, the column must exist in the table or be an expression with alias.
How can I achieve this (seemed so obvious and easy), if it is possible at all this way (using with())? If not, then -- please, advice how to get distinct records in relational table any way (other than manually filtering results using foreach, what I'm doing right now, and what is ugly).
You could set CDbCriteria::distinct to true:
'survey.questions'=>array('distinct'=>true),

How can I use having without group by?

I am reading about having clause in oracle.
It is written in the docs that
If there is no GROUP BY clause, the HAVING clause is applied to the
entire result as a single group.
But whenever I tried to use having clause without group by, I am getting syntax error.
How can I use having without group by?
Can somebody explain me with this schema?
SQL fiddle
A simple experiment will prove that this is possible:
select * from dual having 1=1
This query will run successfully in Oracle 11g. I suspect the problem you're seeing is that you're trying to use an aggregate function in the having clause and that is not allowed with a group by clause.
While it's clearly possible to use having without group by, I don't really see any point. Any condition you would put in the having clause in this scenario would be more appropriate in the where clause.
HAVING without GROUP BY is valid. But also consider below points.
If there is no GROUP BY clause, the HAVING clause is applied to the entire result as a single group.
booleanExpression in HAVING can contain only grouping columns, columns that are part of aggregate expressions, and columns that are part of a subquery.
So when you are using HAVING without GROUP BY the following syntaxes are not valid.
1) SELECT col_x .... /* Since col_x is not a part of GROUP BY
or
2) HAVING col_x (in a boolean expression) /* Since col_x is not a part of GROUP BY
What instead of col_x in you can have is some Aggregate Function on col_x, or a constant, at both HAVING and SELECT clause.
Do not assume the whole table as one group means GROUP BY col_1, ....col_n

Oracle Select Query, Order By + Limit Results

I am new to Oracle and working with a fairly large database. I would like to perform a query that will select the desired columns, order by a certain column and also limit the results. According to everything I have read, the below query should be working but it is returning "ORA-00918: column ambiguously defined":
SELECT * FROM(SELECT * FROM EAI.EAI_EVENT_LOG e,
EAI.EAI_EVENT_LOG_MESSAGE e1 WHERE e.SOURCE_URL LIKE '%.XML'
ORDER BY e.REQUEST_DATE_TIME DESC) WHERE ROWNUM <= 20
Any suggestions would be greatly appreciated :D
The error message means your result set contains two columns with the same name. Each column in a query's projection needs to have a unique name. Presumably you have a column (or columns) with the same name in both EAI_EVENT_LOG and EAI_EVENT_LOG_MESSAGE.
You also want to join on that column. At the moment you are generating a cross join between the two tables. In other words, if you have a hundred records in EAI_EVENT_LOG and two hundred records EAI_EVENT_LOG_MESSAGE your result set will be twenty thousand records (without the rownum). This is probably your intention.
"By switching to innerjoin, will that eliminate the error with the
current code?"
No, you'll still need to handle having two columns with the same name. Basically this comes from using SELECT * on two multiple tables. SELECT * is bad practice. It's convenient but it is always better to specify the exact columns you want in the query's projection. That way you can include (say) e.TRANSACTION_ID and exclude e1.TRANSACTION_ID, and avoid the ORA-00918 exception.
Maybe you have some columns in both EAI_EVENT_LOG and EAI_EVENT_LOG_MESSAGE tables having identical names? Instead of SELECT * list all columns you want to select.
Other problem I see is that you are selecting from two tables but you're not joining them in the WHERE clause hence the result set will be the cross product of those two table.
You need to stop using SQL '89 implicit join syntax.
Not because it doesn't work, but because it is evil.
Right now you have a cross join which in 99,9% of the cases is not what you want.
Also every sub-select needs to have it's own alias.
SELECT * FROM
(SELECT e.*, e1.* FROM EAI.EAI_EVENT_LOG e
INNER JOIN EAI.EAI_EVENT_LOG_MESSAGE e1 on (......)
WHERE e.SOURCE_URL LIKE '%.XML'
ORDER BY e.REQUEST_DATE_TIME DESC) s WHERE ROWNUM <= 20
Please specify a join criterion on the dotted line.
Normally you do a join on a keyfield e.g. ON (e.id = e1.event_id)
It's bad idea to use select *, it's better to specify exactly which fields you want:
SELECT e.field1 as customer_id
,e.field2 as customer_name
.....

Resources