How to use LINQ (or TSQL) to group dataset but include top item and counts - linq

I'm struggling with how to approach this. I have your basic dataset returned by a TSQL (SQL2005) query where the data contains a list of master and detail items. Pretty vanilla, you have your master table joined to your detail table so you might get mulitple rows back per master record.
Something like:
ID / Item Descr / Subitem Descr
1 / Jane Doe / shoes
1 / Jane Doe / hats
2 / John Smith / hats
What I'd like to get to do is "flatten" that out a bit. So something like:
ID / Item Descr / Count / Most Recent Subitem
1 / Jane Doe / 2 / shoes
2 / John Smith / 1 / hats
Any suggestions on the sql query, or perhaps a LINQ query that I could run on the dataset I get back from the initial sql query...?

var q = from i in Context.Items
group i by i.ItemDescr into g
select new
{
ID = g.FirstOrDefault().ID,
ItemDescr = g.Key,
Count = g.Count(),
MostRecentSubItem = g.FirstOrDefault().SubitemDescr // you don't show how to pick the "most recent"
};

Not quite sure what you mean by "top" but this is a start if you want to do it in SQL:
SELECT id, ItemDesc, COUNT(*), MAX(SubItemDesc)
FROM MyTable
GROUP BY id, ItemDesc

I ended up using a correlated sub-query, which always make me nervous from a performance perspective. I don't have numbers to back that up and am not great at reading Execution Plans, it's just a gut feeling.
Lot's of good stackoverflow examples on it that got me down that path(like: Advanced SQL query with sub queries, group by, count and sum functions in to SQLalchemy and Variant use of the GROUP BY clause in TSQL)
I was wondering if using any of the new windowing functions (row_number() or rank()) would have helped, but I didn't get very far down that line.
I still wonder if it would be better to just pull back all the master-detail rows, then "post process" them using LINQ would be better. Our SQL Server would throw back that dataset to the web server pretty darn fast, then "distribute" the work load by doing LINQ on the web server, finally binding it all up to my ASP.NET gridview and sending it the client. A lot of moving parts, but could be interesting...

Related

Oracle Sql group function is not allowed here

I need someone who can explain me about "group function is not allowed here" because I don't understand it and I would like to understand it.
I have to get the product name and the unit price of the products that have a price above the average
I initially tried to use this, but oracle quickly told me that it was wrong.
SELECT productname,unitprice
FROM products
WHERE unitprice>(AVG(unitprice));
search for information and found that I could get it this way:
SELECT productname,unitprice FROM products
WHERE unitprice > (SELECT AVG(unitprice) FROM products);
What I want to know is why do you put two select?
What does group function is not allowed here mean?
More than once I have encountered this error and I would like to be able to understand what to do when it appears
Thank you very much for your time
The phrase "group function not allowed here" is referring to anything that is in some way an "aggregation" of data, eg SUM, MIN, MAX, etc et. These functions must operate on a set of rows, and to operate on a set of rows you need to do a SELECT statement. (I'm leaving out UPDATE/DELETE here)
If this was not the case, you would end up with ambiguities, for example, lets say we allowed this:
select *
from products
where region = 'USA'
and avg(price) > 10
Does this mean you want the average prices across all products, or just the average price for those products in the USA? The syntax is no longer deterministic.
Here's another option:
SELECT *
FROM (
SELECT productname,unitprice,AVG(unitprice) OVER (PARTITION BY 1) avg_price
FROM products)
WHERE unitprice > avg_price
The reason your original SQL doesn't work is because you didn't tell Oracle how to compute the average. What table should it find it in? What rows should it include? What, if any, grouping do you wish to apply? None of that is communicated with "WHERE unitprice>(AVG(unitprice))".
Now, as a human, I can make a pretty educated guess that you intend the averaging to happen over the same set of rows you select from the main query, with the same granularity (no grouping). We can accomplish that either by using a sub-query to make a second pass on the table, as your second SQL did, or the newer windowing capabilities of aggregate functions to internally make a second pass on your query block results, as I did in my answer. Using the OVER clause, you can tell Oracle exactly what rows to include (ROWS BETWEEN ...) and how to group it (PARTITION BY...).

Cognos 11 Crosstab - need a value that doesn't have a reference to the column values

Crosstab report works 99%.
About 20 rows, all but one are ok.
5 columns - Company Division.
The rows are things like cost, revenue, revenue 2, etc.
All the rows that work have three attributes I'm using to select them:
Fiscal Year
Period
Solution.
The problem is there is table that lists an YTD rate for each period. This table is not Division Specific; it's company wide.
All the tables are linked to the accounting period table that has fiscal year and period. So the overall query limits data to fiscal year (?pFiscalYear?) and period <= ?pPeriod?, based on prompt page results.
The source table has this:
FY_CD PD_NO ACT_CURR_RT ACT_YTD_RT
2018 1 0.36121715 0.36121715
2018 2 0.32471476 0.34255512
2018 3 0.25240906 0.31210183
2018 4 0.33154745 0.31925874
Note the YTD rate is not an average of any of the other numbers.
When I select the ACT_YTD_RT, as a row, I want the ACT_YTD_RT that matches the selected period.
What I get is the average if I set the aggregation to average or the lowest if I set it to other aggregations. So sometimes, it looks right (if I run for period 1,2,3, as the rate kept falling), and sometimes it's wrong (period 4
returns .3121 instead of .3192).
I've tried a number of different methods and can generate garbage data (totals, min, max, average) and crossjoins but can't figure out how to get the value I'm looking for.
I want YTD_RT where fiscal year =?pFiscal? and period = ?pPeriod?.
I tried a straight if then clause:
if (sourcetable.fiscalYear = ?pFiscalYear?) and (sourcetable.Period = ?pPeriod?) then (ACT_YTD_RT)
but I get an error like this:
'ACT_YTD_RT' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause. (SQLSTATE=42000, SQLERRORCODE=8120)
If I create another query that generates the right response and try to include it, I get a crossjoin error that the query I'm referencing is trying to crossjoin several other items in the crosstab query.
A union doesn't work (different number of columns).
Not sure how a join would work since the division doesn't exist in the rate table.
I maybe could create a view in the database that did a crossjoin of the division table and the rate table, add that to the framework and then I wouldn't have a crossjoin since the solution would be in the rate "table" (really view), but that seems wrong somehow.
If I could just write a freaking parameterized query direct to the database I'd be done. But in Cognos 11 crosstabs I can't find a place for a SQL query object. And that shouldn't be necessary.
I've spent hours and hours chasing this in circles.
Anybody have any ideas?
Thanks
Paul
So the earlier problem was that this:
if (sourcetable.fiscalYear = ?pFiscalYear?) and (sourcetable.Period = ?pPeriod?) then (ACT_YTD_RT)
Generated an error like this:
'ACT_YTD_RT' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause. (SQLSTATE=42000, SQLERRORCODE=8120)
To fix the above, I had to add a cross join of the division table and the rate table as a view in the database. Then add that to the framework. Then build the data item this way:
total (
if (sourcetable.fiscalYear = ?pFiscalYear?) and (sourcetable.Period = ?pPeriod?) then (ACT_YTD_RT)
)
And now the "total" provides the missing group by. And the crossjoin in the database provides the division information so the crosstab is happy.
I still think there should have been an easier way to do this, but I have a functioning hammer at the moment.

making inner join linq or using [table].[joiningTable].column is the same?

I have recently started working on linq and I was wondering suppose I have 2 related tables Project (<=with fkAccessLevelId) and AccessLevel and I want to just select values from both tables. Now there are 2 ways I can select values from these tables.
The one i commonly use is:
(from P in DataContext.Projects
join AL in DataContext.AccessLevel
on P.AccessLevelId equals AL.AccessLevelId
select new
{
ProjectName = P.Name,
Access = AL.AccessName
}
Another way of doing this would be:
(from P in DataContext.Projects
select new
{
ProjectName = P.Name,
Access = P.AccessLevel.AccessName
}
What i wanted to know is which of these way is efficient if we increase the number of table say 5-6 with 1-2 tables containing thousands of records...?
You should take a look at the SQL generated. You have to understand that there are several main performance bottle necks in a Linq query (in this case I assume a OMG...Linq to SQL?!?!) the usual main bottle neck is the SQL query on the server.
Typically SQL Server has a very good optimizer, so actually, given the same query, refactored, the perf is pretty uniform.
However in your case, there is a very real difference in the two queries. A project with no Access Level would not appear in the first query, whilst the second query would return with a null AccessName. In effect you would be comparing a LEFT JOIN to an INNER JOIN.
TL:DR For SQL Server/Linq to Entity Framework queries that do the same thing should give similar performance. However your queries are far from similar.

Ruby on Rails: Search one table where multiple rows must be present in another table

I'm trying to create a search where a single record must have multiple records in another table (linked by id's and has_many statements) in order to be included as a result.
I have tables users, skill_lists, skill_maps.
users are mapped to individual skills through single entries in the skill_maps table. Many user can share a single skill and single user can have many skills trough multiple entries in the skill_maps table.
e.g.
User_id | Skill_list_id
2 | 9
2 | 15
3 | 9
user 2 has skills 9 and 15
user 3 has only skill 9
I'm trying to create a search that returns a hash of all users which have a set of skills. The set of required skill_ids appear as an array in the params.
Here's the code that I'm using:
skill_selection_user_ids = SkillMap.find_all_by_skill_list_id(params[:skill_ids]).map(&:user_id)
#results = User.find(:all, :conditions => {:id => skill_selection_user_ids})
The problem is that this returns all users that have ANY of these skills not users that have ALL of them.
Also, my users table is linked to the skill_lists table :through => :skill_maps and visa versa so that i can call #user.skill_list etc...
I'm sure this is a real newbie question, I'm totally new to rails (and programming). I searched and searched for a solution but couldn't find anything. I don't really know how to explain the problem in a single search term.
I personally don't know how to do this using ActiveRecord's query interface. The easiest thing to do would be to retrieve lists of users who have each individual skill, and then take the intersection of those lists, perhaps using Set:
require 'set'
skills = [5, 10, 19] # for example
user_ids = skills.map { |s| Set.new(SkillMap.find_all_by_skill_list_id(s).map(&:user_id)) }.reduce(:&)
users = User.where(:id => user_ids.to_a)
For (likely) higher performance, you could "roll your own" SQL and let the DB engine do the work. I may be able to come up with some SQL for you, if you need high performance here. (Or if anyone else can, please edit this answer!)
By the way, you should probably put an index on skill_maps.skill_list_id to ensure good performance even if the skill_maps table gets very large. See the ActiveMigration documentation: http://api.rubyonrails.org/classes/ActiveRecord/Migration.html
You'll probably have to use some custom SQL to get the user IDs. I tested this query on a similar HABTM relationship and it seems to work:
SELECT DISTINCT(user_id) FROM skill_maps AS t1 WHERE (SELECT COUNT(skill_list_id) FROM skill_maps AS t2 WHERE t2.user_id = t1.user_id AND t2.skill_list_id IN (1,2,3)) = 3
The trick is in the subquery. For each row in the outer query, it finds a count of records for that row that match any of the skills that you're interested in. Then it checks whether that count matches the total number of skills you're interested in. If there's a match, then the user must possess all of the skills you searched for.
You could execute this in Rails using find_by_sql:
sql = 'SELECT DISTINCT(user_id) FROM skill_maps AS t1 WHERE (SELECT COUNT(skill_list_id) FROM skill_maps AS t2 WHERE t2.user_id = t1.user_id AND t2.skill_list_id IN (?)) = ?'
skill_ids = params[:skill_ids]
user_ids = SkillMap.find_by_sql([sql, skill_ids, skill_ids.size])
Sorry if the table and column names aren't exactly right, but hopefully this is in the ballpark.

Using Linq to get summaries and count

I have a SQL Server database with an Events table that stores around a million events, each with a dateStart, dateEnd, title, rating and other bits.
I need to display a list of years, where each year displays the 5 highest rated events and the total count of events in that year.
So, something like...
Top 5 events for 2009 (from 199 events)
- Event A
- Event B
- Event C
- Event D
- Event E
Top 5 events for 2010 (from 469 events)
- Event F
- Event G
- Event H
- Event I
- Event J
.... etc.
Because of the sheer quantity of records, I'd like to avoid a Linq query that will pull everything out of the database, but I don't know if that is possible and my Linq knowledge is not enough to work out how this would work.
What is the most efficient method for retrieving this data structure from the database?
Thanks a lot in advance - been mangling my brain all day trying to work it out.
Well, let's assume that you only care about the start date. You'd want something like:
var query = from eventInfo in db.Events
group eventInfo by eventInfo.dateStart.Year into g
select
{
Year = g.Key, // Key for the grouping
Count = g.Count(), // Events for this year
Top5 = g.OrderByDescending(e => e.Rating).Take(5)
};
Frankly I have no idea what kind of SQL that will produce, or if it'll be even vaguely efficient - you should find out the SQL generated, and then look in the SQL profiler to see that the query execution plan is.
You will have to use an ORM, there are several possibilites, including (but not limited to) Linq-To-Sql, Entity Framework, NHibernate (I'm not sure about LINQ support here), etc.
All of those will translate the LINQ query operators to an sql query that will be executed in the database, and will only return the information you need or requested.

Resources