ActiveRecord Subquery Inner Join - ruby

I am trying to convert a "raw" PostGIS SQL query into a Rails ActiveRecord query. My goal is to convert two sequential ActiveRecord queries (each taking ~1ms) into a single ActiveRecord query taking (~1ms). Using the SQL below with ActiveRecord::Base.connection.execute I was able to validate the reduction in time.
Thus, my direct request is to help me to convert this query into an ActiveRecord query (and the best way to execute it).
SELECT COUNT(*)
FROM "users"
INNER JOIN (
SELECT "centroid"
FROM "zip_caches"
WHERE "zip_caches"."postalcode" = '<postalcode>'
) AS "sub" ON ST_Intersects("users"."vendor_coverage", "sub"."centroid")
WHERE "users"."active" = 1;
NOTE that the value <postalcode> is the only variable data in this query. Obviously, there are two models here User and ZipCache. User has no direct relation to ZipCache.
The current two step ActiveRecord query looks like this.
zip = ZipCache.select(:centroid).where(postalcode: '<postalcode>').limit(1).first
User.where{st_intersects(vendor_coverage, zip.centroid)}.count

Disclamer: I've never used PostGIS
First in your final request, it seems like you've missed the WHERE "users"."active" = 1; part.
Here is what I'd do:
First add a active scope on user (for reusability)
scope :active, -> { User.where(active: 1) }
Then for the actual query, You can have the sub query without executing it and use it in a joins on the User model, such as:
subquery = ZipCache.select(:centroid).where(postalcode: '<postalcode>')
User.active
.joins("INNER JOIN (#{subquery.to_sql}) sub ON ST_Intersects(users.vendor_coverage, sub.centroid)")
.count
This allow minimal raw SQL, while keeping only one query.
In any case, check the actual sql request in your console/log by setting the logger level to debug.

The amazing tool scuttle.io is perfect for converting these sorts of queries:
User.select(Arel.star.count).where(User.arel_table[:active].eq(1)).joins(
User.arel_table.join(ZipCach.arel_table).on(
Arel::Nodes::NamedFunction.new(
'ST_Intersects', [
User.arel_table[:vendor_coverage], Sub.arel_table[:centroid]
]
)
).join_sources
)

Related

How can I use Math.X functions with LINQ?

I have a simple table (SQL server and EF6) Myvalues, with columns Id & Value (double)
I'm trying to get the sum of the natural log of all values in this table. My LINQ statement is:
var sum = db.Myvalues.Select(x => Math.Log(x.Value)).Sum();
It compiles fine, but I'm getting a RTE:
LINQ to Entities does not recognize the method 'Double Log(Double)' method, and this method cannot be translated into a store expression.
What am I doing wrong/how can I fix this?
FWIW, I can execute the following SQL query directly against the database which gives me the correct answer:
select exp(sum(LogCol)) from
(select log(Myvalues.Value) as LogCol From Myvalues
) results
LINQ tries to translate Math.Log into a SQL command so it is executed against the DB.
This is not supported.
The first solution (for SQL Server) is to use one of the existing SqlFunctions. More specifically, SqlFunctions.Log.
The other solution is to retrieve all your items from your DB using .ToList(), and execute Math.Log with LINQ to Objects (not LINQ to Entities).
As EF cannot translate Math.Log() you could get your data in memory and execute the function form your client:
var sum = db.Myvalues.ToList().Select(x => Math.Log(x.Value)).Sum();

ActiveRecord 4 cannot retrieve "select AS" field

Ok, I feel really stupid for asking this, but it's driving me nuts and I can't figure it out. The docs say I should be able to use select AS in a Rails/ActiveRecord query. So:
d = Dvd.where(id: 1).select("title AS my_title")
Is a valid query and if I do a to_sql on it, it produces the expected SQL:
SELECT title AS my_title FROM `dvd` WHERE `dvd`.`id` = 1
However, d.my_title will give an error:
NoMethodError: undefined method `my_title' for #<ActiveRecord::Relation
I need to be able to use AS since the columns I want to retrieve from different joins have the same name so I can't access them the "regular" way and have to resort to using AS.
I also don't want to resort to using find_by_sql for future compatibility and a possible switch form Mysql to PostGresql.
Just to clarify, what I'm really trying to do is write this SQL in a Railsy way:
SELECT tracks.name AS track_name, artists.name AS artist_name, composers.name AS composer_name, duration
FROM `tracks_cds`
INNER JOIN `tracks` ON `tracks`.`id` = `tracks_cds`.`track_id`
INNER JOIN `artists` ON `artists`.`id` = `tracks_cds`.`artist_id`
INNER JOIN `composers` ON `composers`.`id` = `tracks_cds`.`composer_id`
WHERE cd_id = cd.id
The top example was just a simplification of the fact that SELECT AS will not give you an easy way to refer to custom fields which I find hard to believe.
ActiveRecord automatically creates getter and setter methods for attributes based on the column names in the database, so there will be none defined for my_title.
Regarding the same common names, why not just do this:
d = Dvd.where(id: 1).select("dvds.title")
You can write your sql query and then just pass into ActiveRecord's execute method
query = "SELECT title AS my_title FROM `dvd` WHERE `dvd`.`id` = 1"
result = ActiveRecord::Base.connection.execute(query)

Get all the includes from an Entity Framework Query?

I've the following Entity Model : Employee has a Company and a Company has Employees.
When using the Include statement like below:
var query = context.Employees.Include(e => e.Company);
query.Dump();
All related data is retrieved from the database correctly. (Using LEFT OUTER JOIN on Company table)
The problem is hat when I use the GroupBy() from System.Linq.Dynamic to group by Company.Name, the Employees are missing the Company data because the Include is lost.
Example:
var groupByQuery = query.GroupBy("new (Company.Name as CompanyName)", "it");
groupByQuery.Dump();
Is there a way to easily retrieve the applied Includes on the 'query' as a string collection, so that I can include them in the dynamic GroupBy like this:
var groupByQuery2 = query.GroupBy("new (Company, Company.Name as CompanyName)", "it");
groupByQuery2.Dump();
I thought about using the ToString() functionality to get the SQL Command like this:
string sql = query.ToString();
And then use RegEx to extract all LEFT OUTER JOINS, but probably there is a better solution ?
if you're creating the query in the first place - I'd always opt to save the includes (and add to them if you're making a composite query/filtering).
e.g. instead of returning just 'query' return new QueryContext {Query = query, Includes = ...}
I'd like to see a more elegant solution - but I think that's your best bet.
Otherwise you're looking at expression trees, visitors and all those nice things.
SQL parsing isn't that straight either - as queries are not always that simple (often a combo of things etc.).
e.g. there is a `span' inside the query object (if you traverse a bit) which seems to be holding the 'Includes' but it's not much help.

NHibernate IQueryable doesn't seem to delay execution

I'm using NHibernate 3.2 and I have a repository method that looks like:
public IEnumerable<MyModel> GetActiveMyModel()
{
return from m in Session.Query<MyModel>()
where m.Active == true
select m;
}
Which works as expected. However, sometimes when I use this method I want to filter it further:
var models = MyRepository.GetActiveMyModel();
var filtered = from m in models
where m.ID < 100
select new { m.Name };
Which produces the same SQL as the first one and the second filter and select must be done after the fact. I thought the whole point in LINQ is that it formed an expression tree that was unravelled when it's needed and therefore the correct SQL for the job could be created, saving my database requests.
If not, it means all of my repository methods have to return exactly what is needed and I can't make use of LINQ further down the chain without taking a penalty.
Have I got this wrong?
Updated
In response to the comment below: I omitted the line where I iterate over the results, which causes the initial SQL to be run (WHERE Active = 1) and the second filter (ID < 100) is obviously done in .NET.
Also, If I replace the second chunk of code with
var models = MyRepository.GetActiveMyModel();
var filtered = from m in models
where m.Items.Count > 0
select new { m.Name };
It generates the initial SQL to retrieve the active records and then runs a separate SQL statement for each record to find out how many Items it has, rather than writing something like I'd expect:
SELECT Name
FROM MyModel m
WHERE Active = 1
AND (SELECT COUNT(*) FROM Items WHERE MyModelID = m.ID) > 0
You are returning IEnumerable<MyModel> from the method, which will cause in-memory evaluation from that point on, even if the underlying sequence is IQueryable<MyModel>.
If you want to allow code after GetActiveMyModel to add to the SQL query, return IQueryable<MyModel> instead.
You're running IEnumerable's extension method "Where" instead of IQueryable's. It will still evaluate lazily and give the same output, however it evaluates the IQueryable on entry and you're filtering the collection in memory instead of against the database.
When you later add an extra condition on another table (the count), it has to lazily fetch each and every one of the Items collections from the database since it has already evaluated the IQueryable before it knew about the condition.
(Yes, I would also like to be the extensive extension methods on IEnumerable to instead be virtual members, but, alas, they're not)

Is this linq query efficient?

Is this linq query efficient?
var qry = ((from member in this.ObjectContext.TreeMembers.Where(m => m.UserId == userId && m.Birthdate == null)
select member.TreeMemberId).Except(from item in this.ObjectContext.FamilyEvents select item.TreeMemberId));
var mainQry = from mainMember in this.ObjectContext.TreeMembers
where qry.Contains(mainMember.TreeMemberId)
select mainMember;
Will this be translated into multiple sql calls or just one? Can it be optimised? Basically I have 2 tables, I want to select those records from table1 where datetime is null and that record should not exist in table2.
The easiest way to find out if the query will make multiple calls is to set the .Log property of the data context. I typically set it to write to a DebugOutputWriter. A good example for this kind of class can be found here.
For a general way of thinking about it however, if you use a property of your class that does not directly map to a database field in a where clause or a join clause, it will typically make multiple calls. From what you have provided, it looks like this is not the case for your scenario, but I can't absolutely certain and suggest using the method listed above.

Resources