How do I perform a full join in ARel? - ruby

Assume I have two tables TableA and TableB with a many-to-many relationship through a joining table TableABJoin. I would like to use ARel 3 to generate a query that performs a full join of TableA and TableB.
The query I want to generate should be something along these lines:
SELECT a.id, b.code
FROM TableA as a, TableB as b
This results in a full join of tables A and B.
The closest I have been able to get, without writing an explicit sql string, is to hack an outer join:
part_a = TableA.arel_table
part_b = TableB.arel_table
query = part_a.join(part_b, Arel::Nodes::OuterJoin).on('1=1').project(part_a[:id], part_b[:code]).to_sql
This produces the following SQL:
SELECT "TableA"."id", "TableB"."code" FROM "TableA" LEFT OUTER JOIN "TableB" ON 1=1
If I exclude the .on component I end up with a trailing NULL:
SELECT "TableA"."id", "TableB"."code" FROM "TableA" LEFT OUTER JOIN "TableB" NULL
Is there a more reasonable way to generate a proper full join or at least generate the same result without hacking a left outer join in ARel?

Actually, is not possible to perform a full outer join (and a right outer join) with arel, which only supports Inner and Outer (LEFT OUTER) joins.
Because I didn't like this, I updated arel 3-0-stable (I'm working on a 3.2.13 rails application) so that it supports right and full outer joins too. Adding those was pretty straightforward even with undocumented code, it's quite easy so you should face no issue if you try it.
Here you can find my pull request: https://github.com/rails/arel/pull/202
And here you can find the updated repository with a branch: https://github.com/Fire-Dragon-DoL/arel/tree/3-0-right-full-outer-join
You can easily use this in a rails application by adding this to your Gemfile:
gem 'arel', '~> 3.0.3.4', github: 'Fire-Dragon-DoL/arel', branch: '3-0-right-full-outer-join'
Here you can see a syntax example:
class Cat < ActiveRecord::Base
has_many :cat_diseases
end
class Disease < ActiveRecord::Base
has_many :cat_diseases
end
class CatDisease < ActiveRecord::Base
belongs_to :cat
belongs_to :disease
def self.all_diseases_for_cat(cat)
cat_diseases = self.arel_table
diseases = Disease.arel_table
scoped
.joins(
cat_diseases.join(diseases, Arel::Nodes::RightOuterJoin)
.on(cat_diseases[:disease_id].eq(diseases[:id]))
.join_sources
)
.where(
cat_diseases[:cat_id].eq(cat.id)
.or(cat_diseases[:cat_id].eq(nil))
)
end
end

I write the join in SQL. Its clearer and works fine:
part_a = TableA.arel_table
part_b = TableB.arel_table
query = part_a.joins('LEFT OUTER JOIN "TableB"').other_scopes.to_sql

Related

activerecord class from custom sql query

Is it possible to create an activerecord class from a custom sql query or view ?
It doesn't need to be editable.
example:
class c
select a.*, b.* from a, b where a.code = b.code
end
This example is a join where all the fields from both tables are resent, the one to one joins in activerecord only present fields from one table, the others are accessible through a.bs.fieldname.
I would like them aal to be fields on the same level, thus in one class.
so that a.code, a.name and b.code, b.extra can be accessed as c.code, c.name, c.extra
ActiveRecord will works with your views as with tables. So, firstly create custom view for your things
CREATE VIEW some_things AS (select a.*, b.* from a, b where a.code = b.code)
And then make ActiveRecord based class for accessing them (app/models/some_thing.rb)
class SomeThing < ActiveRecord::Base
end
You can access to your things as like as any other AR objects
p SomeThing.where(code: 'xxx-yyy').order(:name).limit(10).all

Sequel: How to use group and count

Simply put, how can I do this query using Sequel?
select a.id, count(t.id)
from albums a
right join tracks t on t.album_id = a.id
group by a.id
DB[:albums___a].
right_join(:tracks___t, :album_id=>:id).
select_group(:a__id).
select_more{count(:t__id)}
Your problem is that the left join finds a track ID for each album ID. Solutions:
right join
subquery of sums, assuming sequel supports that: left join (select album_id, count(album_id) as count from tracks group by album_id) t on
a strait up from albums a, tracks t where t.album_id=a.id instead of the join.

How to improve LINQ to EF performance

I have two classes: Property and PropertyValue. A property has several values where each value is a new revision.
When retrieving a set of properties I want to include the latest revision of the value for each property.
in T-SQL this can very efficiently be done like this:
SELECT
p.Id,
pv1.StringValue,
pv1.Revision
FROM dbo.PropertyValues pv1
LEFT JOIN dbo.PropertyValues pv2 ON pv1.Property_Id = pv2.Property_Id AND pv1.Revision < pv2.Revision
JOIN dbo.Properties p ON p.Id = pv1.Property_Id
WHERE pv2.Id IS NULL
ORDER BY p.Id
The "magic" in this query is to join on the lesser than condition and look for rows without a result forced by the LEFT JOIN.
How can I accomplish something similar using LINQ to EF?
The best thing I could come up with was:
from pv in context.PropertyValues
group pv by pv.Property into g
select g.OrderByDescending(p => p.Revision).FirstOrDefault()
It does produce the correct result but is about 10 times slower than the other.
Maybe this can help. Where db is the database context:
(
from pv1 in db.PropertyValues
from pv2 in db.PropertyValues.Where(a=>a.Property_Id==pv1.Property_Id && pv1.Revision<pv2.Revision).DefaultIfEmpty()
join p in db.Properties
on pv1.Property_Id equals p.Id
where pv2.Id==null
orderby p.Id
select new
{
p.Id,
pv1.StringValue,
pv1.Revision
}
);
Next to optimizing a query in Linq To Entities, you also have to be aware of the work it takes for the Entity Framework to translate your query to SQL and then map the results back to your objects.
Comparing a Linq To Entities query directly to a SQL query will always result in lower performance because the Entity Framework does a lot more work for you.
So it's also important to look at optimizing the steps the Entity Framework takes.
Things that could help:
Precompile your query
Pre-generate views
Decide for yourself when to open the database connection
Disable tracking (if appropriate)
Here you can find some documentation with performance strategies.
if you want to use multiple conditions (less than expression) in join you can do this like
from pv1 in db.PropertyValues
join pv2 in db.PropertyValues on new{pv1.Property_ID, Condition = pv1.Revision < pv2.Revision} equals new {pv2.Property_ID , Condition = true} into temp
from t in temp.DefaultIfEmpty()
join p in db.Properties
on pv1.Property_Id equals p.Id
where t.Id==null
orderby p.Id
select new
{
p.Id,
pv1.StringValue,
pv1.Revision
}

Linq to entities Left Join

I want to achieve the following in Linq to Entities:
Get all Enquires that have no Application or the Application has a status != 4 (Completed)
select e.*
from Enquiry enq
left outer join Application app
on enq.enquiryid = app.enquiryid
where app.Status <> 4 or app.enquiryid is null
Has anyone done this before without using DefaultIfEmpty(), which is not supported by Linq to Entities?
I'm trying to add a filter to an IQueryable query like this:
IQueryable<Enquiry> query = Context.EnquirySet;
query = (from e in query
where e.Applications.DefaultIfEmpty()
.Where(app=>app.Status != 4).Count() >= 1
select e);
Thanks
Mark
In EF 4.0+, LEFT JOIN syntax is a little different and presents a crazy quirk:
var query = from c1 in db.Category
join c2 in db.Category on c1.CategoryID equals c2.ParentCategoryID
into ChildCategory
from cc in ChildCategory.DefaultIfEmpty()
select new CategoryObject
{
CategoryID = c1.CategoryID,
ChildName = cc.CategoryName
}
If you capture the execution of this query in SQL Server Profiler, you will see that it does indeed perform a LEFT OUTER JOIN. HOWEVER, if you have multiple LEFT JOIN ("Group Join") clauses in your Linq-to-Entity query, I have found that the self-join clause MAY actually execute as in INNER JOIN - EVEN IF THE ABOVE SYNTAX IS USED!
The resolution to that? As crazy and, according to MS, wrong as it sounds, I resolved this by changing the order of the join clauses. If the self-referencing LEFT JOIN clause was the 1st Linq Group Join, SQL Profiler reported an INNER JOIN. If the self-referencing LEFT JOIN clause was the LAST Linq Group Join, SQL Profiler reported an LEFT JOIN.
Do this:
IQueryable<Enquiry> query = Context.EnquirySet;
query = (from e in query
where (!e.Applications.Any())
|| e.Applications.Any(app => app.Status != 4)
select e);
I don't find LINQ's handling of the problem of what would be an "outer join" in SQL "goofy" at all. The key to understanding it is to think in terms of an object graph with nullable properties rather than a tabular result set.
Any() maps to EXISTS in SQL, so it's far more efficient than Count() in some cases.
Thanks guys for your help. I went for this option in the end but your solutions have helped broaden my knowledge.
IQueryable<Enquiry> query = Context.EnquirySet;
query = query.Except(from e in query
from a in e.Applications
where a.Status == 4
select e);
Because of Linq's goofy (read non-standard) way of handling outers, you have to use DefaultIfEmpty().
What you'll do is run your Linq-To-Entities query into two IEnumerables, then LEFT Join them using DefaultIfEmpty(). It may look something like:
IQueryable enq = Enquiry.Select();
IQueryable app = Application.Select();
var x = from e in enq
join a in app on e.enquiryid equals a.enquiryid
into ae
where e.Status != 4
from appEnq in ae.DefaultIfEmpty()
select e.*;
Just because you can't do it with Linq-To-Entities doesn't mean you can't do it with raw Linq.
(Note: before anyone downvotes me ... yes, I know there are more elegant ways to do this. I'm just trying to make it understandable. It's the concept that's important, right?)
Another thing to consider, if you directly reference any properties in your where clause from a left-joined group (using the into syntax) without checking for null, Entity Framework will still convert your LEFT JOIN into an INNER JOIN.
To avoid this, filter on the "from x in leftJoinedExtent" part of your query like so:
var y = from parent in thing
join child in subthing on parent.ID equals child.ParentID into childTemp
from childLJ in childTemp.Where(c => c.Visible == true).DefaultIfEmpty()
where parent.ID == 123
select new {
ParentID = parent.ID,
ChildID = childLJ.ID
};
ChildID in the anonymous type will be a nullable type and the query this generates will be a LEFT JOIN.

Linq-to-entities - Include() method not loading

If I use a join, the Include() method is no longer working, eg:
from e in dc.Entities.Include("Properties")
join i in dc.Items on e.ID equals i.Member.ID
where (i.Collection.ID == collectionID)
select e
e.Properties is not loaded
Without the join, the Include() works
Lee
UPDATE: Actually I recently added another Tip that covers this, and provides an alternate probably better solution. The idea is to delay the use of Include() until the end of the query, see this for more information: Tip 22 - How to make include really include
There is known limitation in the Entity Framework when using Include().
Certain operations are just not supported with Include.
Looks like you may have run into one on those limitations, to work around this you should try something like this:
var results =
from e in dc.Entities //Notice no include
join i in dc.Items on e.ID equals i.Member.ID
where (i.Collection.ID == collectionID)
select new {Entity = e, Properties = e.Properties};
This will bring back the Properties, and if the relationship between entity and Properties is a one to many (but not a many to many) you will find that each resulting anonymous type has the same values in:
anonType.Entity.Properties
anonType.Properties
This is a side-effect of a feature in the Entity Framework called relationship fixup.
See this Tip 1 in my EF Tips series for more information.
Try this:
var query = (ObjectQuery<Entities>)(from e in dc.Entities
join i in dc.Items on e.ID equals i.Member.ID
where (i.Collection.ID == collectionID)
select e)
return query.Include("Properties")
So what is the name of the navigation property on "Entity" which relates to "Item.Member" (i.e., is the other end of the navigation). You should be using this instead of the join. For example, if "entity" add a property called Member with the cardinality of 1 and Member had a property called Items with a cardinality of many, you could do this:
from e in dc.Entities.Include("Properties")
where e.Member.Items.Any(i => i.Collection.ID == collectionID)
select e
I'm guessing at the properties of your model here, but this should give you the general idea. In most cases, using join in LINQ to Entities is wrong, because it suggests that either your navigational properties are not set up correctly, or you are not using them.
So, I realise I am late to the party here, however I thought I'd add my findings. This should really be a comment on Alex James's post, but as I don't have the reputation it'll have to go here.
So my answer is: it doesn't seem to work at all as you would intend. Alex James gives two interesting solutions, however if you try them and check the SQL, it's horrible.
The example I was working on is:
var theRelease = from release in context.Releases
where release.Name == "Hello World"
select release;
var allProductionVersions = from prodVer in context.ProductionVersions
where prodVer.Status == 1
select prodVer;
var combined = (from release in theRelease
join p in allProductionVersions on release.Id equals p.ReleaseID
select release).Include(release => release.ProductionVersions);
var allProductionsForChosenRelease = combined.ToList();
This follows the simpler of the two examples. Without the include it produces the perfectly respectable sql:
SELECT
[Extent1].[Id] AS [Id],
[Extent1].[Name] AS [Name]
FROM [dbo].[Releases] AS [Extent1]
INNER JOIN [dbo].[ProductionVersions] AS [Extent2] ON [Extent1].[Id] = [Extent2].[ReleaseID]
WHERE ('Hello World' = [Extent1].[Name]) AND (1 = [Extent2].[Status])
But with, OMG:
SELECT
[Project1].[Id1] AS [Id],
[Project1].[Id] AS [Id1],
[Project1].[Name] AS [Name],
[Project1].[C1] AS [C1],
[Project1].[Id2] AS [Id2],
[Project1].[Status] AS [Status],
[Project1].[ReleaseID] AS [ReleaseID]
FROM ( SELECT
[Extent1].[Id] AS [Id],
[Extent1].[Name] AS [Name],
[Extent2].[Id] AS [Id1],
[Extent3].[Id] AS [Id2],
[Extent3].[Status] AS [Status],
[Extent3].[ReleaseID] AS [ReleaseID],
CASE WHEN ([Extent3].[Id] IS NULL) THEN CAST(NULL AS int) ELSE 1 END AS [C1]
FROM [dbo].[Releases] AS [Extent1]
INNER JOIN [dbo].[ProductionVersions] AS [Extent2] ON [Extent1].[Id] = [Extent2].[ReleaseID]
LEFT OUTER JOIN [dbo].[ProductionVersions] AS [Extent3] ON [Extent1].[Id] = [Extent3].[ReleaseID]
WHERE ('Hello World' = [Extent1].[Name]) AND (1 = [Extent2].[Status])
) AS [Project1]
ORDER BY [Project1].[Id1] ASC, [Project1].[Id] ASC, [Project1].[C1] ASC
Total garbage. The key point to note here is the fact that it returns the outer joined version of the table which has not been limited by status=1.
This results in the WRONG data being returned:
Id Id1 Name C1 Id2 Status ReleaseID
2 1 Hello World 1 1 2 1
2 1 Hello World 1 2 1 1
Note that the status of 2 is being returned there, despite our restriction. It simply does not work.
If I have gone wrong somewhere, I would be delighted to find out, as this is making a mockery of Linq. I love the idea, but the execution doesn't seem to be usable at the moment.
Out of curiosity, I tried the LinqToSQL dbml rather than the LinqToEntities edmx that produced the mess above:
SELECT [t0].[Id], [t0].[Name], [t2].[Id] AS [Id2], [t2].[Status], [t2].[ReleaseID], (
SELECT COUNT(*)
FROM [dbo].[ProductionVersions] AS [t3]
WHERE [t3].[ReleaseID] = [t0].[Id]
) AS [value]
FROM [dbo].[Releases] AS [t0]
INNER JOIN [dbo].[ProductionVersions] AS [t1] ON [t0].[Id] = [t1].[ReleaseID]
LEFT OUTER JOIN [dbo].[ProductionVersions] AS [t2] ON [t2].[ReleaseID] = [t0].[Id]
WHERE ([t0].[Name] = #p0) AND ([t1].[Status] = #p1)
ORDER BY [t0].[Id], [t1].[Id], [t2].[Id]
Slightly more compact - weird count clause, but overall same total FAIL.
Has anybody actually ever used this stuff in a real business application? I'm really starting to wonder...
Please tell me I've missed something obvious, as I really want to like Linq!
Try the more verbose way to do more or less the same thing obtain the same results, but with more datacalls:
var mydata = from e in dc.Entities
join i in dc.Items
on e.ID equals i.Member.ID
where (i.Collection.ID == collectionID)
select e;
foreach (Entity ent in mydata) {
if(!ent.Properties.IsLoaded) { ent.Properties.Load(); }
}
Do you still get the same (unexpected) result?
EDIT: Changed the first sentence, as it was incorrect. Thanks for the pointer comment!

Resources