Doctrine result cache bug with LEFT JOIN ... WITH condition - doctrine

I tend to find answers before I need to post a question here, but today I can't seem to find out what is wrong.
We're using Doctrine 2.1.2 in a Symfony 2 app, and in a repository we have two methods that use almost the same query.
The only difference between method A and method B is that there is a condition added to a JOIN that is common to both queries.
The problem is that Doctrine seems to use the same result cache for both queries.
When we call method A, method B uses the cache from A, and the other way around.
I have been using expireResultCache(true) and useResultCache(false), to no avail.
Here's what the queries look like:
-- method A
SELECT DISTINCT a, b, c FROM MyBundle:ObjectA a INDEX BY a.id
LEFT JOIN a.fkObjectB b
LEFT JOIN a.fkObjectC c
-- method B
SELECT DISTINCT a, b, c FROM MyBundle:ObjectA a INDEX BY a.id
LEFT JOIN a.fkObjectB b WITH b.some_field IS NULL
LEFT JOIN a.fkObjectC c
When I use getSQL(), I see that they result in different queries as expected. The generated queries, when run independantly in database, do generate different results.
This leads me to believe that it may be an annoying result cache bug, where Doctrine does not cache the conditions for JOINs, but only the table names.
Is this a bug, or is there something I can do?
EDIT Still happening in Doctrine 2.1.6.

I think the problem you have is fixed in Doctrine 2.2. I have similar problem related to result cache and here is my question&answers.

Just to expand on michel v's comment, Doctrine 2 is fetching the same object instance both times via the identity map pattern.
Calling:
EntityManager#clear()
clears the identity map and forces the EntityManager to fetch a fresh copy of the object from the database.

Related

Why does Hibernate ignore setMaxResults?

I am using a server side pagination for one of my tables using a CriteriaQuery and a TypedQuery and set following values:
typedQuery.setFirstResult(0);
typedQuery.setMaxResults(100);
Unfortunately, in the generated SQL query which is executed on Oracle DB, i never see the ROWNUM condition. I added also an ORDER BY in my TypedQuery, but still, the query does a simple select withouut limiting on the DB the results.
As a result i am getting following warning HHH000104: firstResult/maxResults specified with collection fetch; applying in memory! . In other words, Hibernate does the pagination on memory as it is not performed on the DB. For this warning i read following article https://vladmihalcea.com/fix-hibernate-hhh000104-entity-fetch-pagination-warning-message/ but before spliting my query into two queries (retrieve id and then retrieve data for those id), i thought of giving setMaxResults . Still i wonder why isn't the generated query as expected with a ROWNUM.
Furtehr information:
DB: Oracle 18
Dialog: org.hibernate.dialect.Oracle12cDialect
Hibernate: 5.3.15
JDK: 11
You have to understand that the first/max results apply on entity level if you select an entity. If you fetch join collection attributes or use an entity graph for collection attributes you change the cardinality of the rows returned by JDBC for each entity i.e. every row for the main entity row is duplicated for every collection element. The effect of that is, that Hibernate can't do pagination with ROWNUM anymore which is why you are not seeing it in the query. If you remove the fetch join you will see the use of ROWNUM.
Having said that, this is a perfect use case for Blaze-Persistence.
Blaze-Persistence is a query builder on top of JPA which supports many of the advanced DBMS features on top of the JPA model. The pagination support it comes with handles all of the issues you might encounter.
It also has a Spring Data integration, so you can use the same code like you do now, you only have to add the dependency and do the setup: https://persistence.blazebit.com/documentation/entity-view/manual/en_US/index.html#spring-data-setup
Blaze-Persistence has many different strategies for pagination which you can configure. The default strategy is to inline the query for ids into the main query. Something like this:
select u
from User u
left join fetch u.notes
where u.id IN (
select u2.id
from User u2
order by ...
limit ...
)
order by ...
When joining data, parent will be duplicated n times. For example:
select p from Post p join p.comments
If post have 20 comments under one post, then this one post will be returned 20 times with 20 different comments.
Limiting rows in this case doesn't make sense because actual number of returned post won't be equal to page size. In other words limiting page to 20 records will return only one post.

making inner join linq or using [table].[joiningTable].column is the same?

I have recently started working on linq and I was wondering suppose I have 2 related tables Project (<=with fkAccessLevelId) and AccessLevel and I want to just select values from both tables. Now there are 2 ways I can select values from these tables.
The one i commonly use is:
(from P in DataContext.Projects
join AL in DataContext.AccessLevel
on P.AccessLevelId equals AL.AccessLevelId
select new
{
ProjectName = P.Name,
Access = AL.AccessName
}
Another way of doing this would be:
(from P in DataContext.Projects
select new
{
ProjectName = P.Name,
Access = P.AccessLevel.AccessName
}
What i wanted to know is which of these way is efficient if we increase the number of table say 5-6 with 1-2 tables containing thousands of records...?
You should take a look at the SQL generated. You have to understand that there are several main performance bottle necks in a Linq query (in this case I assume a OMG...Linq to SQL?!?!) the usual main bottle neck is the SQL query on the server.
Typically SQL Server has a very good optimizer, so actually, given the same query, refactored, the perf is pretty uniform.
However in your case, there is a very real difference in the two queries. A project with no Access Level would not appear in the first query, whilst the second query would return with a null AccessName. In effect you would be comparing a LEFT JOIN to an INNER JOIN.
TL:DR For SQL Server/Linq to Entity Framework queries that do the same thing should give similar performance. However your queries are far from similar.

How do i get an inner join in a WCF Data Service

Lets say i have 2 tables, table1 and table2, with a shared key "id"
if i want an inner join of those two tables using sql, i'd do something like
select id, x, y, z
from table1
inner join table2
on table1.id = table2.id
I now get rows in table 1 that only intersect occur in table 2.
how do i get the equivalent in wcf data service/odata linq syntax?
i'm expecting something like:
var q = (from t in svc.Table1.Expand("Table2")
where t.Table2.Any()
select t) as DataServiceQuery<Table1>;
but that gets me an exception about Any().
I've tried .Join and that isn't supported either.
I've tried .Count and that fails too.
.Intersect looks like it only takes another enumerable, so that doesn't look like what i want...
i think i'm missing something really obvious or simple...
Edit: this appears to be a dup of this How do I use OData Expand like a SQL join?
Take a look at the answers to this type of question. The current version of WCF Data Services (OData) does not support joins even if your underlying data contract does (i.e. if you're layering on top of Entity Framework 4 for instance).
The more recent releases of WCF Data Services now include Any/All support. See What's New in WCF Data Services 5.0
Currently the OData protocol (and thus WCF Data Services) doesn't support any/all operations. It also doesn't support arbitrary joins, although some joins can be expressed as navigations.
Your query is currently not supported, but we're looking into adding support for the any/all operations. Take a look at this proposal if that would fulfill your needs:
http://www.odata.org/blog/support-for-any-and-all
Is WCF in play here? To link two objects from two tables/lists, I'd do this:
var result =
from o1 in Table1
join o2 in Table2 on o2.id equals o1.id

How to automatically exclude items already visited in recommendation algorithm?

I'm now using slope One for recommendation.
How to exclude visited items from result?
I can't do it simply by not in (visited_id_list) to filter those visited ones because it will have scalability issue for an old user!
I've come up with a solution without not in:
select b.property,count(b.id) total from propertyviews a
left join propertyviews b on b.cookie=a.cookie
left join propertyviews c on c.cookie=0 and b.property=c.property
where a.property=1 and a.cookie!=0 and c.property is null
group by b.property order by total;
Seriously, if you are using MySQL, look at 12.2.10.3. Subqueries with ANY, IN, and SOME
For example:
SELECT s1 FROM t1 WHERE s1 IN (SELECT s1 FROM t2);
This is available in all versions of MySQL I looked at, albeit that the section numbers in the manual are different in the older versions.
EDIT in response to the OP's comment:
OK ... how about something like SELECT id FROM t1 WHERE ... AND NOT id IN (SELECT seen_id FROM user_seen_ids where user = ? ). This form avoids having to pass thousands of ids in the SQL statement.
If you want to entirely avoid the "test against a list of ids" part of the query, I don't see how it is even possible in theory, let alone how you would implement it.

Outer Joins with Subsonic 3.0

Does anyone know of a way to do a left outer join with SubSonic 3.0 or another way to approach this problem? What I am trying to accomplish is that I have one table for departments and another table for divisions. A department can have multiple divisions. I need to display a list of departments with the divisions it contains. Getting back a collection of departments which each contain a collection of divisions would be ideal, but I would take a flattened result table too.
Using the LINQ syntax seems to be broken (I am new to LINQ though and may be using it wrong), for example this throws an ArgumentException error:
var allDepartments = from div in Division.All()
join dept in Department.All() on div.DepartmentId equals dept.Id into divdept
select divdept;
So I figured I could fall back to using the SubSonic query syntax. This code however generates an INNER JOIN instead of an OUTER JOIN:
List<Department> allDepartments = new Select()
.From<Department>()
.LeftOuterJoin<Division>(DepartmentsTable.IdColumn, DivisionsTable.DepartmentIdColumn)
.ExecuteTypedList<Department>();
Any help would be appreciated. I am not having much luck with SubSonic 3. I really enjoyed using SubSonic 2 and may go back to that if I can't figure out something as basic as a left join.
Getting back a collection of departments which each contain a collection of divisions would be ideal
SubSonic does this for you (if you setup your relationships correctly in the database), just select all Departments:
var depts = Model.Department.All();
There will be a property in each item of depts named Divisions, which contains a collection of Division objects.

Resources