Logic of applying Group by in Sql Queries - linq

I am asking a very beginner level of question but I am always confused whenever I want to use aggregate function with Group by. Actually I am getting the right results but I am not pretty sure about how group by is working here. My requirement is to get the count of sent items which is based on MessageGroup columns.
MessageId SenderId MessageGroup Message
_____________________________________________________________________________
1 2 67217969-e03d-41ec-863e-659ca26e660f Hi
2 2 67217969-e03d-41ec-863e-659ca26e660f Hello
3 2 67217969-e03d-41ec-863e-659ca26e660f bye
4 1 c45dc414-9320-40a5-8f8f-9c960d6deffe TC
5 1 8486d16b-294b-45a5-8674-e7024e55f39b shutup
Actually I want to get the count for sent messages.here SenderId=2 has sent three messages to someone but I want to show a single count so I have used MessageGroup and I am doing Groupby and getting the count.
I have used Linq query::
return DB.tblMessage.Where(m => m.SenderId == 2 ).GroupBy(m => m.MessageGroup).Count();
This returns "1" which is correct and I want to show (1) in sent messages.
But if I try to query the above in SQL Server, it returns 3
Here is my SQL query:
select count(*)
from tblMessage
where SenderId = 2
group by MessageGroup
The Linq query is right As it returns me one as Microsoft says here
Actually I am confused with Group by. Please clear my point.

When you are using GroupBy, which ever columns present in groupBy Clause should be in Select Clause
select MessageGroup,count(MessageGroup)from tblMessage
where SenderId=2
group by MessageGroup

You want to include MessageGroup as part of the select, like this:
select MessageGroup, count(*)
from tblMessage
where SenderId=2
group by MessageGroup

Related

Hive Getting error on group by column while using case statements and aggregations

I am working on a query in hive. In that I am using aggregations like sum and case statements and group by clause. I have changed the column names and table names but my logic is same which I was using in my project
select
empname,
empsal,
emphike,
sum(empsal) as tot_sal,
sum(emphike) as tot_hike,
case when tot_sal > 1000 then exp(tot_hike)
else 0
end as manager
from employee
group by
empname,
empsal,
emphike
For the above query I was getting error as "Expression not in group by key '1000'".
So I have slightly modified the query and tried again My other query is
select
empname,
empsal,
emphike,
sum(empsal) as tot_sal,
sum(emphike) as tot_hike,
case when sum(empsal) > 1000 then exp(sum(emphike))
else 0
end as manager
from employee
group by
empname,
empsal,
emphike
For above query its putting me error as "Expression not in group by key 'Manager'".
When I add manager in the group by its showing invalid alias.
Please help me out here
I see three issues in your query:
1.) Hive cannot group by a variable you defined in the select block by the name you gave it right away. You will probably need a subquery for that.
2.) Hive tends to show errors when sum or count operations are not at the end of the query.
3.) Although I do not know what your goal is, I think that your query will not deliver the desired result. If you group by empsal there would be no difference between empsal and sum(empsal) by design. Same goes for emphike and sum(emphike).
I think the following query might solve these issues:
select
a.empname,
a.tot_sal,
a.tot_hike,
if(a.tot_sal > 1000, exp(a.tot_hike), 0) as manager
from
(select
empname,
sum(empsal) as tot_sal,
sum(emphike) as tot_hike,
from employee
group by
empname
)a
The if statement is equivalent to your case statement, however I find it a bit easier to read.
In this example you wouldn't need to group by after the subquery because the grouping is done in the subquery a.

select and calculate numdays from two tables in oracle

I m doing a leave calculation. I have Leave requested table and Employee Table.
Their relationship is Employee can request many Leaves. i.e
Leave request table has Employee_Serail_ID as one to many. I have done the following query to select all leave request and calculkate the number of days.
SELECT (LR.DATE_TO - LR.DATE_FROM) as NumDays ,
LR.EMPLOYEE_SERIAL_ID, LR.ID as LEAVE_REQUEST_ID
FROM TBL_LEAVE_REQUEST LR ;
NUMDAYS EMPLOYEE_SERIAL_ID LEAVE_REQUEST_ID
3 EMP_286 LEAVE_35
2 EMP_243 LEAVE_36
2 EMP_284 LEAVE_37
3 EMP_243 LEAVE_38
32 EMP_243 LEAVE_39
0 EMP_303 LEAVE_40
1 EMP_241 LEAVE_41
But , i figured out that all employee who have not requested leave will not be selected using this query.
I want to modify this query that - if the employee has rquested a leave it will show the numdays , and if it has not this query should return Numdays 0 for all employees.
Numdays
You'll need to left join your leave request table to your actual employee table. This will give you an employee record, even if they don't have a leave request.
Since you haven't posted your schema, and you haven't specified what database you're actually using, I can't write much of the query for you. Your logic will look something like this:
SELECT
T.EMPLOYEE_ID
, ISNULL((LR.DATE_TO - LR.DATE_FROM), 0) as NumDays
, LR.EMPLOYEE_SERIAL_ID
, LR.ID as LEAVE_REQUEST_ID
FROM
TBL_EMPLOYEE T
LEFT JOIN TBL_LEAVE_REQUEST LR
on T.EMPLOYEE_ID = LR.EMPLOYEE_ID
;
The ISNULL function is used by MSSQL Server. Other databases require different functions.
If you're using Oracle, replace ISNULL( with NVL(.
If you're using PostgreSQL or MySQL, you'll want the command COALESCE(.
A note in ISNULL() and NVL() vs COALESCE(). As #Ronnis pointed out, any ANSI compliant database should support the COALESCE() function.
Looking into the documentation a little further, you may get better query performance using COALESCE() than NVL() or ISNULL(). The former will short circuit its evaluation, whereas the other two will not.

ActiveRecord query help. Finding the latest record from each group

So I have an orders table that looks like this:
Where ledger_id is a uuid and version is a timestamp.
There could be many orders per ledger_id. This is a denormalized table btw, used to keep track of orders and their progression through processing FWIW.
If a couple ledger_ids come in and we want the latest order for each ledger_id, what's the ActiveRecord query that will get us this?
I feel like I'm close. I have this:
orders = Order.where(ledger_id: ledger_ids).group(:ledger_id, :id).having('version = MIN(version)').first
where ledger_ids is an array of 1 or more ledger uuids.
But this gives us an error:
:StatementInvalid: PG::GroupingError: ERROR: column "orders.id" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: SELECT "orders".* FROM "orders" WHERE "orders"."ledger_id" ...
^
: SELECT "orders".* FROM "orders" WHERE "orders"."ledger_id" = $1 GROUP BY "orders"."ledger_id" ORDER BY "orders"."id" ASC LIMIT $2
Anyone know of a solution?

Using rownum in subquery

In an algorithm the users passes a query, for instance:
SELECT o_orderdate, o_orderpriority FROM h_orders WHERE rownum <= 5
The query returns the following:
1996-01-02 5-LOW
1996-12-01 1-URGENT
1993-10-14 5-LOW
1995-10-11 5-LOW
1994-07-30 5-LOW
The algorithm needs the count for the select attributes (o_orderdate, o_orderpriority in the above example) and therefore it rewrites the query to:
SELECT o_orderdate, count(o_orderdate) FROM
(SELECT o_orderdate, o_orderpriority FROM h_orders WHERE rownum <= 5)
GROUP BY o_orderdate
This query returns the following:
1992-01-01 5
However the intended result is:
1996-12-01 1
1995-10-11 1
1994-07-30 1
1996-01-02 1
1993-10-14 1
Any idea how I could rewrite the parsing stage or how the user could pass a syntactically different query to receive the above results?
The rows returned by the inner query are essentially non-deterministic, as they depend on the order in which the optimiser identifies rows as part of the required data set. A change in execution plan due to modified predicates might change the order in which the rows come back, and new rows added to the table can also change which rows are included.
If you always want n rows then either use distinct(o_orderdate) in the innerquery, which will render the GROUP BY useless.
Or you can add another outer select with rownum to get n of the grouped rows, like this:
select o_orderdate, counter from
(
SELECT o_orderdate, count(o_orderdate) as counter FROM
(SELECT o_orderdate, o_orderpriority FROM h_orders)
GROUP BY o_orderdate
)
WHERE rownum <= 5
Although the results will most likely be useless as they will be undeterministic (as mentioned by David Aldridge).
As your outer query makes no use of "o_orderpriority", why not just get rid of the subquery and simply query like this:
SELECT o_orderdate, count(o_orderdate) AS order_count
FROM h_orders
WHERE rownum <= 5
GROUP BY o_orderdate

Oracle Error : Maximum number of expressions in a list is 1000

I am working in C#.Net and Oracle. i am passing a string to a query. i had used this code for concating all the item id's
List<string> listRetID = new List<string>();
foreach (DataRow row in dtNew.Rows)
{
listRetID.Add(row[3].ToString());
}
This concatination goes above 10,000. so i am getting the error message like this..
ORA-01795: maximum number of expressions in a list is 1000
How to fix this..
The documentation states:
A comma-delimited list of expressions can contain no more than 1000
expressions. A comma-delimited list of sets of expressions can contain
any number of sets, but each set can contain no more than 1000
expressions.
Presumably you're using this string as the contents of in IN (...) restriction, in which case there isn't really anything you can do - this just won't work. A common way to work around this is to generate a dummy table as a subquery or common table expression (CTE) and joining to that, but I'm not sure how you'd translate your List - possibly similar to whatever you're doing with your IN clause. You'd want to end up with your query looking something like:
with tmp_tab as (
select <val1 from list> as val from dual
union all select <val2 from list from dual
union all select <val3 from list from dual
...
)
select <something>
from <your table> yt
join tmp_tab tt on yt.<field> = tt.val
But that requires generating the entire (huge) query including the CTE each time you run it, and there's no opportunity to use bind variables.
You might find something like this approach more palatable.
You can have 10 lists of 1000 items instead of 1 list of 10000 items.
WHERE some_column IN (1,2,...,1000)
OR some_column IN (1001,1002,...2000) -- etc.
Not a C# guy but I would just split the list listRetID in multiple lists or create a list of lists
Then loop through that list of lists and perform the query on each element of the list.
What is the intent of your query?
It looks like you are selecting rows that have some column equal to the 3rd column of one of the records of some query.
The correct way of doing this is either an SQL join or a subquery. There is absolutely no need to bring this into C# code. For example, using a subquery you can write something like this:
SELECT *
FROM atable
WHERE afield IN (
SELECT field3
FROM someothertable)

Resources