Is this linq query efficient? - linq

Is this linq query efficient?
var qry = ((from member in this.ObjectContext.TreeMembers.Where(m => m.UserId == userId && m.Birthdate == null)
select member.TreeMemberId).Except(from item in this.ObjectContext.FamilyEvents select item.TreeMemberId));
var mainQry = from mainMember in this.ObjectContext.TreeMembers
where qry.Contains(mainMember.TreeMemberId)
select mainMember;
Will this be translated into multiple sql calls or just one? Can it be optimised? Basically I have 2 tables, I want to select those records from table1 where datetime is null and that record should not exist in table2.

The easiest way to find out if the query will make multiple calls is to set the .Log property of the data context. I typically set it to write to a DebugOutputWriter. A good example for this kind of class can be found here.
For a general way of thinking about it however, if you use a property of your class that does not directly map to a database field in a where clause or a join clause, it will typically make multiple calls. From what you have provided, it looks like this is not the case for your scenario, but I can't absolutely certain and suggest using the method listed above.

Related

ActiveRecord Subquery Inner Join

I am trying to convert a "raw" PostGIS SQL query into a Rails ActiveRecord query. My goal is to convert two sequential ActiveRecord queries (each taking ~1ms) into a single ActiveRecord query taking (~1ms). Using the SQL below with ActiveRecord::Base.connection.execute I was able to validate the reduction in time.
Thus, my direct request is to help me to convert this query into an ActiveRecord query (and the best way to execute it).
SELECT COUNT(*)
FROM "users"
INNER JOIN (
SELECT "centroid"
FROM "zip_caches"
WHERE "zip_caches"."postalcode" = '<postalcode>'
) AS "sub" ON ST_Intersects("users"."vendor_coverage", "sub"."centroid")
WHERE "users"."active" = 1;
NOTE that the value <postalcode> is the only variable data in this query. Obviously, there are two models here User and ZipCache. User has no direct relation to ZipCache.
The current two step ActiveRecord query looks like this.
zip = ZipCache.select(:centroid).where(postalcode: '<postalcode>').limit(1).first
User.where{st_intersects(vendor_coverage, zip.centroid)}.count
Disclamer: I've never used PostGIS
First in your final request, it seems like you've missed the WHERE "users"."active" = 1; part.
Here is what I'd do:
First add a active scope on user (for reusability)
scope :active, -> { User.where(active: 1) }
Then for the actual query, You can have the sub query without executing it and use it in a joins on the User model, such as:
subquery = ZipCache.select(:centroid).where(postalcode: '<postalcode>')
User.active
.joins("INNER JOIN (#{subquery.to_sql}) sub ON ST_Intersects(users.vendor_coverage, sub.centroid)")
.count
This allow minimal raw SQL, while keeping only one query.
In any case, check the actual sql request in your console/log by setting the logger level to debug.
The amazing tool scuttle.io is perfect for converting these sorts of queries:
User.select(Arel.star.count).where(User.arel_table[:active].eq(1)).joins(
User.arel_table.join(ZipCach.arel_table).on(
Arel::Nodes::NamedFunction.new(
'ST_Intersects', [
User.arel_table[:vendor_coverage], Sub.arel_table[:centroid]
]
)
).join_sources
)

ActiveRecord 4 cannot retrieve "select AS" field

Ok, I feel really stupid for asking this, but it's driving me nuts and I can't figure it out. The docs say I should be able to use select AS in a Rails/ActiveRecord query. So:
d = Dvd.where(id: 1).select("title AS my_title")
Is a valid query and if I do a to_sql on it, it produces the expected SQL:
SELECT title AS my_title FROM `dvd` WHERE `dvd`.`id` = 1
However, d.my_title will give an error:
NoMethodError: undefined method `my_title' for #<ActiveRecord::Relation
I need to be able to use AS since the columns I want to retrieve from different joins have the same name so I can't access them the "regular" way and have to resort to using AS.
I also don't want to resort to using find_by_sql for future compatibility and a possible switch form Mysql to PostGresql.
Just to clarify, what I'm really trying to do is write this SQL in a Railsy way:
SELECT tracks.name AS track_name, artists.name AS artist_name, composers.name AS composer_name, duration
FROM `tracks_cds`
INNER JOIN `tracks` ON `tracks`.`id` = `tracks_cds`.`track_id`
INNER JOIN `artists` ON `artists`.`id` = `tracks_cds`.`artist_id`
INNER JOIN `composers` ON `composers`.`id` = `tracks_cds`.`composer_id`
WHERE cd_id = cd.id
The top example was just a simplification of the fact that SELECT AS will not give you an easy way to refer to custom fields which I find hard to believe.
ActiveRecord automatically creates getter and setter methods for attributes based on the column names in the database, so there will be none defined for my_title.
Regarding the same common names, why not just do this:
d = Dvd.where(id: 1).select("dvds.title")
You can write your sql query and then just pass into ActiveRecord's execute method
query = "SELECT title AS my_title FROM `dvd` WHERE `dvd`.`id` = 1"
result = ActiveRecord::Base.connection.execute(query)

Get all the includes from an Entity Framework Query?

I've the following Entity Model : Employee has a Company and a Company has Employees.
When using the Include statement like below:
var query = context.Employees.Include(e => e.Company);
query.Dump();
All related data is retrieved from the database correctly. (Using LEFT OUTER JOIN on Company table)
The problem is hat when I use the GroupBy() from System.Linq.Dynamic to group by Company.Name, the Employees are missing the Company data because the Include is lost.
Example:
var groupByQuery = query.GroupBy("new (Company.Name as CompanyName)", "it");
groupByQuery.Dump();
Is there a way to easily retrieve the applied Includes on the 'query' as a string collection, so that I can include them in the dynamic GroupBy like this:
var groupByQuery2 = query.GroupBy("new (Company, Company.Name as CompanyName)", "it");
groupByQuery2.Dump();
I thought about using the ToString() functionality to get the SQL Command like this:
string sql = query.ToString();
And then use RegEx to extract all LEFT OUTER JOINS, but probably there is a better solution ?
if you're creating the query in the first place - I'd always opt to save the includes (and add to them if you're making a composite query/filtering).
e.g. instead of returning just 'query' return new QueryContext {Query = query, Includes = ...}
I'd like to see a more elegant solution - but I think that's your best bet.
Otherwise you're looking at expression trees, visitors and all those nice things.
SQL parsing isn't that straight either - as queries are not always that simple (often a combo of things etc.).
e.g. there is a `span' inside the query object (if you traverse a bit) which seems to be holding the 'Includes' but it's not much help.

Is it possible to detect if the selected item is the first in LINQ-to-SQL?

I wonder how I can build a query expression which understands the given item being selected is the first or not. Say I'm selecting 10 items from DB:
var query = db.Table.Take(10).Select(t => IsFirst ? t.Value1 : t.Value2);
There is an indexed variant of Select but that is not supported in LINQ-to-SQL. If it was supported my problems would be solved. Is there any other trick?
I could have used ROW_NUMBER() on T-SQL for instance, which LINQ-to-SQL uses but does not give access to.
I know I can Concat two queries and use the first expression in the first and so forth but I don't want to manipulate the rest of the query, just the select statement itself because the query is built at multiple places and this is where I want to behave differently on first row. I'll consider other options if that is not possible.
You can use the indexed overload, but you need to use the LINQ to Objects version:
var query =
db.Table.Take(10).AsEnumreable()
.Select((t, index) => index == 0 ? t.Value1 : t.Value2);
If Table have a primary key. You could do this:
var result= (
from t in db.Table.Take(10)
let first=db.Table.Take(10).Select (ta =>ta.PrimayKey).First()
select new
{
Value=(t.PrimaryKey=first?t.Value1 : t.Value2)
}
);

Can I force the auto-generated Linq-to-SQL classes to use an OUTER JOIN?

Let's say I have an Order table which has a FirstSalesPersonId field and a SecondSalesPersonId field. Both of these are foreign keys that reference the SalesPerson table. For any given order, either one or two salespersons may be credited with the order. In other words, FirstSalesPersonId can never be NULL, but SecondSalesPersonId can be NULL.
When I drop my Order and SalesPerson tables onto the "Linq to SQL Classes" design surface, the class builder spots the two FK relationships from the Order table to the SalesPerson table, and so the generated Order class has a SalesPerson field and a SalesPerson1 field (which I can rename to SalesPerson1 and SalesPerson2 to avoid confusion).
Because I always want to have the salesperson data available whenever I process an order, I am using DataLoadOptions.LoadWith to specify that the two salesperson fields are populated when the order instance is populated, as follows:
dataLoadOptions.LoadWith<Order>(o => o.SalesPerson1);
dataLoadOptions.LoadWith<Order>(o => o.SalesPerson2);
The problem I'm having is that Linq to SQL is using something like the following SQL to load an order:
SELECT ...
FROM Order O
INNER JOIN SalesPerson SP1 ON SP1.salesPersonId = O.firstSalesPersonId
INNER JOIN SalesPerson SP2 ON SP2.salesPersonId = O.secondSalesPersonId
This would make sense if there were always two salesperson records, but because there is sometimes no second salesperson (secondSalesPersonId is NULL), the INNER JOIN causes the query to return no records in that case.
What I effectively want here is to change the second INNER JOIN into a LEFT OUTER JOIN. Is there a way to do that through the UI for the class generator? If not, how else can I achieve this?
(Note that because I'm using the generated classes almost exclusively, I'd rather not have something tacked on the side for this one case if I can avoid it).
Edit: per my comment reply, the SecondSalesPersonId field is nullable (in the DB, and in the generated classes).
The default behaviour actually is a LEFT JOIN, assuming you've set up the model correctly.
Here's a slightly anonymized example that I just tested on one of my own databases:
class Program
{
static void Main(string[] args)
{
using (TestDataContext context = new TestDataContext())
{
DataLoadOptions dlo = new DataLoadOptions();
dlo.LoadWith<Place>(p => p.Address);
context.LoadOptions = dlo;
var places = context.Places.Where(p => p.ID >= 100 && p.ID <= 200);
foreach (var place in places)
{
Console.WriteLine(p.ID, p.AddressID);
}
}
}
}
This is just a simple test that prints out a list of places and their address IDs. Here is the query text that appears in the profiler:
SELECT [t0].[ID], [t0].[Name], [t0].[AddressID], ...
FROM [dbo].[Places] AS [t0]
LEFT OUTER JOIN (
SELECT 1 AS [test], [t1].[AddressID],
[t1].[StreetLine1], [t1].[StreetLine2],
[t1].[City], [t1].[Region], [t1].[Country], [t1].[PostalCode]
FROM [dbo].[Addresses] AS [t1]
) AS [t2] ON [t2].[AddressID] = [t0].[AddressID]
WHERE ([t0].[PlaceID] >= #p0) AND ([t0].[PlaceID] <= #p1)
This isn't exactly a very pretty query (your guess is as good as mine as to what that 1 as [test] is all about), but it's definitively a LEFT JOIN and doesn't exhibit the problem you seem to be having. And this is just using the generated classes, I haven't made any changes.
Note that I also tested this on a dual relationship (i.e. a single Place having two Address references, one nullable, one not), and I get the exact same results. The first (non-nullable) gets turned into an INNER JOIN, and the second gets turned into a LEFT JOIN.
It has to be something in your model, like changing the nullability of the second reference. I know you say it's configured as nullable, but maybe you need to double-check? If it's definitely nullable then I suggest you post your full schema and DBML so somebody can try to reproduce the behaviour that you're seeing.
If you make the secondSalesPersonId field in the database table nullable, LINQ-to-SQL should properly construct the Association object so that the resulting SQL statement will do the LEFT OUTER JOIN.
UPDATE:
Since the field is nullable, your problem may be in explicitly declaring dataLoadOptions.LoadWith<>(). I'm running a similar situation in my current project where I have an Order, but the order goes through multiple stages. Each stage corresponds to a separate table with data related to that stage. I simply retrieve the Order, and the appropriate data follows along, if it exists. I don't use the dataLoadOptions at all, and it does what I need it to do. For example, if the Order has a purchase order record, but no invoice record, Order.PurchaseOrder will contain the purchase order data and Order.Invoice will be null. My query looks something like this:
DC.Orders.Where(a => a.Order_ID == id).SingleOrDefault();
I try not to micromanage LINQ-to-SQL...it does 95% of what I need straight out of the box.
UPDATE 2:
I found this post that discusses the use of DefaultIfEmpty() in order to populated child entities with null if they don't exist. I tried it out with LINQPad on my database and converted that example to lambda syntax (since that's what I use):
ParentTable.GroupJoin
(
ChildTable,
p => p.ParentTable_ID,
c => c.ChildTable_ID,
(p, aggregate) => new { p = p, aggregate = aggregate }
)
.SelectMany (a => a.aggregate.DefaultIfEmpty (),
(a, c) => new
{
ParentTableEntity = a.p,
ChildTableEntity = c
}
)
From what I can figure out from this statement, the GroupJoin expression relates the parent and child tables, while the SelectMany expression aggregates the related child records. The key appears to be the use of the DefaultIfEmpty, which forces the inclusion of the parent entity record even if there are no related child records. (Thanks for compelling me to dig into this further...I think I may have found some useful stuff to help with a pretty huge report I've got on my pipeline...)
UPDATE 3:
If the goal is to keep it simple, then it looks like you're going to have to reference those salesperson fields directly in your Select() expression. The reason you're having to use LoadWith<>() in the first place is because the tables are not being referenced anywhere in your query statement, so the LINQ engine won't automatically pull that information in.
As an example, given this structure:
MailingList ListCompany
=========== ===========
List_ID (PK) ListCompany_ID (PK)
ListCompany_ID (FK) FullName (string)
I want to get the name of the company associated with a particular mailing list:
MailingLists.Where(a => a.List_ID == 2).Select(a => a.ListCompany.FullName)
If that association has NOT been made, meaning that the ListCompany_ID field in the MailingList table for that record is equal to null, this is the resulting SQL generated by the LINQ engine:
SELECT [t1].[FullName]
FROM [MailingLists] AS [t0]
LEFT OUTER JOIN [ListCompanies] AS [t1] ON [t1].[ListCompany_ID] = [t0].[ListCompany_ID]
WHERE [t0].[List_ID] = #p0

Resources