Assert a specific and unique value in column - spring

my RDBMS is PostgreSQL. I am using SpringBoot and Hibernate as JPA.
Let's consider a very simple one-column table:
Man
Age: integer
And I would like to implement a such method that add a man to the table. That method should satisfy the following condition:
At most one man in table can be 80 years old
.
#Transactional
void addMan(int age){
....
}
It looks like I need to take a exclusive lock for whole table, yes? How to do it?

There is a second solution.
Use SERIALIZABLE transactions that look like this:
START TRANSACTION ISOLATION LEVEL SERIALIZABLE;
SELECT count(*) FROM man WHERE age = 80;
Now, if the result is 0, continue:
INSERT INTO man VALUES (80);
COMMIT;
If two transactions try to do this concurrently, one will fail with a serialization error.
In that case, you just have to retry the transaction until it succeeds.
Serializable transactions perform worse than transactions with a lower isolation level, but they will still perform better than actually serializing the transactions.

I would add a unique constraint / index to the table and let the database do the work.
create unique index man_age_unique_idx on man (age);
If a record with that age does not already exist, no problem.
If it does exist, you should get back a PersistenceException where the cause is a Hibernate ConstraintViolationException. From that you can get the name of the constraint that was violated, e.g. man_age_unique_idx in the example above.
try {
entityManager.persist(man);
} catch (PersistenceException e) {
if (e.getCause() instanceof ConstraintViolationException) {
ConstraintViolationException cve = (ConstraintViolationException) e.getCause();
if (Objects.equals(cve.getConstraintName(), "man_age_unique_idx")) {
// handle as appropriate... e.g. throw some custom business exception
throw new DuplicateAgeException("Duplicate age: " + man.getAge(), e);
}
} else {
// handle other causes ...
}
}

Related

Spring Boot JPA save() method trying to insert exisiting row

I have a simple kafka consumer that collects events and based on the data in them inserts or updates a record in the database - table has a unique ID constraint on ID column and also in the entity field.
Everything works fine when the table is pre-populated and inserts happen every now and then. However when i truncate the table and send a couple thousand events with limited number of ID (i was doing 50 unique ID within 3k events) then events are processed simultaneously and the save() method randomly fails with Unique constraint violation exception. I debugged it and the outcome is pretty simple.
event1={id = 1 ... //somedata} gets picked up, service method saveOrUpdateRecord() looks for the record by ID=1, finds none, inserts a new record.
event2={id = 1 ... //somedata} gets picked up almost at the same time, service method saveOrUpdateRecord() looks for the record by ID=1, finds none (previous one is mid-inserting), tries to insert and fails with constraint violation exception - should find this record and merge it with the input from the event based on my conditions.
How can i get the saveOrUpdateRecord() to run only when the previous one was fully executed to prevent such behaviour? I really dont want to slow kafka consumer down with poll size etc, i just want my service to execute one transaction at a time.
The service method:
public void saveOrUpdateRecord(Object input) {
Object output = repository.findById(input.getId));
if (output == null) {
repository.save(input);
} else {
mergeRecord(input, output);
repository.save(output);
}
}
Will #Transactional annotaion on method do the job?
Make your service thread safe.
Use this:
public synchronized void saveOrUpdateRecord(Object input) {
Object output = repository.findById(input.getId));
if (output == null) {
repository.save(input);
} else {
mergeRecord(input, output);
repository.save(output);
}
}

Spring Data / Hibernate save entity with Postgres using Insert on Conflict Update Some fields

I have a domain object in Spring which I am saving using JpaRepository.save method and using Sequence generator from Postgres to generate id automatically.
#SequenceGenerator(initialValue = 1, name = "device_metric_gen", sequenceName = "device_metric_seq")
public class DeviceMetric extends BaseTimeModel {
#Id
#GeneratedValue(strategy = GenerationType.SEQUENCE, generator = "device_metric_gen")
#Column(nullable = false, updatable = false)
private Long id;
///// extra fields
My use-case requires to do an upsert instead of normal save operation (which I am aware will update if the id is present). I want to update an existing row if a combination of three columns (assume a composite unique) is present or else create a new row.
This is something similar to this:
INSERT INTO customers (name, email)
VALUES
(
'Microsoft',
'hotline#microsoft.com'
)
ON CONFLICT (name)
DO
UPDATE
SET email = EXCLUDED.email || ';' || customers.email;
One way of achieving the same in Spring-data that I can think of is:
Write a custom save operation in the service layer that
Does a get for the three-column and if a row is present
Set the same id in current object and do a repository.save
If no row present, do a normal repository.save
Problem with the above approach is that every insert now does a select and then save which makes two database calls whereas the same can be achieved by postgres insert on conflict feature with just one db call.
Any pointers on how to implement this in Spring Data?
One way is to write a native query insert into values (all fields here). The object in question has around 25 fields so I am looking for an another better way to achieve the same.
As #JBNizet mentioned, you answered your own question by suggesting reading for the data and then updating if found and inserting otherwise. Here's how you could do it using spring data and Optional.
Define a findByField1AndField2AndField3 method on your DeviceMetricRepository.
public interface DeviceMetricRepository extends JpaRepository<DeviceMetric, UUID> {
Optional<DeviceMetric> findByField1AndField2AndField3(String field1, String field2, String field3);
}
Use the repository in a service method.
#RequiredArgsConstructor
public class DeviceMetricService {
private final DeviceMetricRepository repo;
DeviceMetric save(String email, String phoneNumber) {
DeviceMetric deviceMetric = repo.findByField1AndField2AndField3("field1", "field", "field3")
.orElse(new DeviceMetric()); // create new object in a way that makes sense for you
deviceMetric.setEmail(email);
deviceMetric.setPhoneNumber(phoneNumber);
return repo.save(deviceMetric);
}
}
A word of advice on observability:
You mentioned that this is a high throughput use case in your system. Regardless of the approach taken, consider instrumenting timers around this save. This way you can measure the initial performance against any tunings you make in an objective way. Look at this an experiment and be prepared to pivot to other solutions as needed. If you are always reading these three columns together, ensure they are indexed. With these things in place, you may find that reading to determine update/insert is acceptable.
I would recommend using a named query to fetch a row based on your candidate keys. If a row is present, update it, otherwise create a new row. Both of these operations can be done using the save method.
#NamedQuery(name="getCustomerByNameAndEmail", query="select a from Customers a where a.name = :name and a.email = :email");
You can also use the #UniqueColumns() annotation on the entity to make sure that these columns always maintain uniqueness when grouped together.
Optional<Customers> customer = customerRepo.getCustomersByNameAndEmail(name, email);
Implement the above method in your repository. All it will do it call the query and pass the name and email as parameters. Make sure to return an Optional.empty() if there is no row present.
Customers c;
if (customer.isPresent()) {
c = customer.get();
c.setEmail("newemail#gmail.com");
c.setPhone("9420420420");
customerRepo.save(c);
} else {
c = new Customer(0, "name", "email", "5451515478");
customerRepo.save(c);
}
Pass the ID as 0 and JPA will insert a new row with the ID generated according to the sequence generator.
Although I never recommend using a number as an ID, if possible use a randomly generated UUID for the primary key, it will qurantee uniqueness and avoid any unexpected behaviour that may come with sequence generators.
With spring JPA it's pretty simple to implement this with clean java code.
Using Spring Data JPA's method T getOne(ID id), you're not querying the DB itself but you are using a reference to the DB object (proxy). Therefore when updating/saving the entity you are performing a one time operation.
To be able to modify the object Spring provides the #Transactional annotation which is a method level annotation that declares that the method starts a transaction and closes it only when the method itself ends its runtime.
You'd have to:
Start a jpa transaction
get the Db reference through getOne
modify the DB reference
save it on the database
close the transaction
Not having much visibility of your actual code I'm gonna abstract it as much as possible:
#Transactional
public void saveOrUpdate(DeviceMetric metric) {
DeviceMetric deviceMetric = metricRepository.getOne(metric.getId());
//modify it
deviceMetric.setName("Hello World!");
metricRepository.save(metric);
}
The tricky part is to not think the getOne as a SELECT from the DB. The database never gets called until the 'save' method.

Spring data Neo4j Affected row count

Considering a Spring Boot, neo4j environment with Spring-Data-neo4j-4 I want to make a delete and get an error message when it fails to delete.
My problem is since the Repository.delete() returns void I have no ideia if the delete modified anything or not.
First question: is there any way to get the last query affected lines? for example in plsql I could do SQL%ROWCOUNT
So anyway, I tried the following code:
public void deletesomething(Long somethingId) {
somethingRepository.delete(getExistingsomething(somethingId).getId());
}
private something getExistingsomething(Long somethingId, int depth) {
return Optional.ofNullable(somethingRepository.findOne(somethingId, depth))
.orElseThrow(() -> new somethingNotFoundException(somethingId));
}
In the code above I query the database to check if the value exist before I delete it.
Second question: do you recommend any different approach?
So now, just to add some complexity, I have a cluster database and db1 can only Create, Update and Delete, and db2 and db3 can only Read (this is ensured by the cluster sockets). db2 and db3 will receive the data from db1 from the replication process.
For what I seen so far replication can take up to 90s and that means that up to 90s the database will have a different state.
Looking again to the code above:
public void deletesomething(Long somethingId) {
somethingRepository.delete(getExistingsomething(somethingId).getId());
}
in debug that means:
getExistingsomething(somethingId).getId() // will hit db2
somethingRepository.delete(...) // will hit db1
and so if replication has not inserted the value in db2 this code wil throw the exception.
the second question is: without changing those sockets is there any way for me to delete and give the correct response?
This is not currently supported in Spring Data Neo4j, if you wish please open a feature request.
In the meantime, perhaps the easiest work around is to fall down to the OGM level of abstraction.
Create a class that is injected with org.neo4j.ogm.session.Session
Use the following method on Session
Example: (example is in Kotlin, which was on hand)
fun deleteProfilesByColor(color : String)
{
var query = """
MATCH (n:Profile {color: {color}})
DETACH DELETE n;
"""
val params = mutableMapOf(
"color" to color
)
val result = session.query(query, params)
val statistics = result.queryStatistics() //Use these!
}

Spring + Hibernate: Query Plan Cache Memory usage

I'm programming an application with the latest version of Spring Boot. I recently became problems with growing heap, that can not be garbage collected. The analysis of the heap with Eclipse MAT showed that, within one hour of running the application, the heap grew to 630MB and with Hibernate's SessionFactoryImpl using more than 75% of the whole heap.
Is was looking for possible sources around the Query Plan Cache, but the only thing I found was this, but that did not play out. The properties were set like this:
spring.jpa.properties.hibernate.query.plan_cache_max_soft_references=1024
spring.jpa.properties.hibernate.query.plan_cache_max_strong_references=64
The database queries are all generated by the Spring's Query magic, using repository interfaces like in this documentation. There are about 20 different queries generated with this technique. No other native SQL or HQL are used.
Sample:
#Transactional
public interface TrendingTopicRepository extends JpaRepository<TrendingTopic, Integer> {
List<TrendingTopic> findByNameAndSource(String name, String source);
List<TrendingTopic> findByDateBetween(Date dateStart, Date dateEnd);
Long countByDateBetweenAndName(Date dateStart, Date dateEnd, String name);
}
or
List<SomeObject> findByNameAndUrlIn(String name, Collection<String> urls);
as example for IN usage.
Question is: Why does the query plan cache keep growing (it does not stop, it ends in a full heap) and how to prevent this? Did anyone encounter a similar problem?
Versions:
Spring Boot 1.2.5
Hibernate 4.3.10
I've hit this issue as well. It basically boils down to having variable number of values in your IN clause and Hibernate trying to cache those query plans.
There are two great blog posts on this topic.
The first:
Using Hibernate 4.2 and MySQL in a project with an in-clause query
such as: select t from Thing t where t.id in (?)
Hibernate caches these parsed HQL queries. Specifically the Hibernate
SessionFactoryImpl has QueryPlanCache with queryPlanCache and
parameterMetadataCache. But this proved to be a problem when the
number of parameters for the in-clause is large and varies.
These caches grow for every distinct query. So this query with 6000
parameters is not the same as 6001.
The in-clause query is expanded to the number of parameters in the
collection. Metadata is included in the query plan for each parameter
in the query, including a generated name like x10_, x11_ , etc.
Imagine 4000 different variations in the number of in-clause parameter
counts, each of these with an average of 4000 parameters. The query
metadata for each parameter quickly adds up in memory, filling up the
heap, since it can't be garbage collected.
This continues until all different variations in the query parameter
count is cached or the JVM runs out of heap memory and starts throwing
java.lang.OutOfMemoryError: Java heap space.
Avoiding in-clauses is an option, as well as using a fixed collection
size for the parameter (or at least a smaller size).
For configuring the query plan cache max size, see the property
hibernate.query.plan_cache_max_size, defaulting to 2048 (easily too
large for queries with many parameters).
And second (also referenced from the first):
Hibernate internally uses a cache that maps HQL statements (as
strings) to query plans. The cache consists of a bounded map limited
by default to 2048 elements (configurable). All HQL queries are loaded
through this cache. In case of a miss, the entry is automatically
added to the cache. This makes it very susceptible to thrashing - a
scenario in which we constantly put new entries into the cache without
ever reusing them and thus preventing the cache from bringing any
performance gains (it even adds some cache management overhead). To
make things worse, it is hard to detect this situation by chance - you
have to explicitly profile the cache in order to notice that you have
a problem there. I will say a few words on how this could be done
later on.
So the cache thrashing results from new queries being generated at
high rates. This can be caused by a multitude of issues. The two most
common that I have seen are - bugs in hibernate which cause parameters
to be rendered in the JPQL statement instead of being passed as
parameters and the use of an "in" - clause.
Due to some obscure bugs in hibernate, there are situations when
parameters are not handled correctly and are rendered into the JPQL
query (as an example check out HHH-6280). If you have a query that is
affected by such defects and it is executed at high rates, it will
thrash your query plan cache because each JPQL query generated is
almost unique (containing IDs of your entities for example).
The second issue lays in the way that hibernate processes queries with
an "in" clause (e.g. give me all person entities whose company id
field is one of 1, 2, 10, 18). For each distinct number of parameters
in the "in"-clause, hibernate will produce a different query - e.g.
select x from Person x where x.company.id in (:id0_) for 1 parameter,
select x from Person x where x.company.id in (:id0_, :id1_) for 2
parameters and so on. All these queries are considered different, as
far as the query plan cache is concerned, resulting again in cache
thrashing. You could probably work around this issue by writing a
utility class to produce only certain number of parameters - e.g. 1,
10, 100, 200, 500, 1000. If you, for example, pass 22 parameters, it
will return a list of 100 elements with the 22 parameters included in
it and the remaining 78 parameters set to an impossible value (e.g. -1
for IDs used for foreign keys). I agree that this is an ugly hack but
could get the job done. As a result you will only have at most 6
unique queries in your cache and thus reduce thrashing.
So how do you find out that you have the issue? You could write some
additional code and expose metrics with the number of entries in the
cache e.g. over JMX, tune logging and analyze the logs, etc. If you do
not want to (or can not) modify the application, you could just dump
the heap and run this OQL query against it (e.g. using mat): SELECT l.query.toString() FROM INSTANCEOF org.hibernate.engine.query.spi.QueryPlanCache$HQLQueryPlanKey l. It
will output all queries currently located in any query plan cache on
your heap. It should be pretty easy to spot whether you are affected
by any of the aforementioned problems.
As far as the performance impact goes, it is hard to say as it depends
on too many factors. I have seen a very trivial query causing 10-20 ms
of overhead spent in creating a new HQL query plan. In general, if
there is a cache somewhere, there must be a good reason for that - a
miss is probably expensive so your should try to avoid misses as much
as possible. Last but not least, your database will have to handle
large amounts of unique SQL statements too - causing it to parse them
and maybe create different execution plans for every one of them.
I have same problems with many(>10000) parameters in IN-queries. The number of my parameters is always different and I can not predict this, my QueryCachePlan growing too fast.
For database systems supporting execution plan caching, there's a better chance of hitting the cache if the number of possible IN clause parameters lowers.
Fortunately Hibernate of version 5.2.18 and higher has a solution with padding of parameters in IN-clause.
Hibernate can expand the bind parameters to power-of-two: 4, 8, 16, 32, 64.
This way, an IN clause with 5, 6, or 7 bind parameters will use the 8 IN clause, therefore reusing its execution plan.
If you want to activate this feature, you need to set this property to true hibernate.query.in_clause_parameter_padding=true.
For more information see this article, atlassian.
I had the exact same problem using Spring Boot 1.5.7 with Spring Data (Hibernate) and the following config solved the problem (memory leak):
spring:
jpa:
properties:
hibernate:
query:
plan_cache_max_size: 64
plan_parameter_metadata_max_size: 32
Starting with Hibernate 5.2.12, you can specify a hibernate configuration property to change how literals are to be bound to the underlying JDBC prepared statements by using the following:
hibernate.criteria.literal_handling_mode=BIND
From the Java documentation, this configuration property has 3 settings
AUTO (default)
BIND - Increases the likelihood of jdbc statement caching using bind parameters.
INLINE - Inlines the values rather than using parameters (be careful of SQL injection).
I had a similar issue, the issue is because you are creating the query and not using the PreparedStatement. So what happens here is for each query with different parameters it creates an execution plan and caches it.
If you use a prepared statement then you should see a major improvement in the memory being used.
TL;DR: Try to replace the IN() queries with ANY() or eliminate them
Explanation:
If a query contains IN(...) then a plan is created for each amount of values inside IN(...), since the query is different each time.
So if you have IN('a','b','c') and IN ('a','b','c','d','e') - those are two different query strings/plans to cache. This answer tells more about it.
In case of ANY(...) a single (array) parameter can be passed, so the query string will remain the same and the prepared statement plan will be cached once (example given below).
Cause:
This line might cause the issue:
List<SomeObject> findByNameAndUrlIn(String name, Collection<String> urls);
as under the hood it generates different IN() queries for every amount of values in "urls" collection.
Warning:
You may have IN() query without writing it and even without knowing about it.
ORM's such as Hibernate may generate them in the background - sometimes in unexpected places and sometimes in a non-optimal ways.
So consider enabling query logs to see the actual queries you have.
Fix:
Here is a (pseudo)code that may fix issue:
query = "SELECT * FROM trending_topic t WHERE t.name=? AND t.url=?"
PreparedStatement preparedStatement = connection.prepareStatement(queryTemplate);
currentPreparedStatement.setString(1, name); // safely replace first query parameter with name
currentPreparedStatement.setArray(2, connection.createArrayOf("text", urls.toArray())); // replace 2nd parameter with array of texts, like "=ANY(ARRAY['aaa','bbb'])"
But:
Don't take any solution as a ready-to-use answer. Make sure to test the final performance on actual/big data before going to production - no matter which answer you choose.
Why? Because IN and ANY both have pros and cons, and they can bring serious performance issues if used improperly (see examples in references below). Also make sure to use parameter binding to avoid security issues as well.
References:
100x faster Postgres performance by changing 1 line - performance of Any(ARRAY[]) vs ANY(VALUES())
Index not used with =any() but used with in - different performance of IN and ANY
Understanding SQL Server query plan cache
Hope this helps. Make sure to leave a feedback whether it worked or not - in order to help people like you. Thanks!
I had a big issue with this queryPlanCache, so I did a Hibernate cache monitor to see the queries in the queryPlanCache.
I am using in QA environment as a Spring task each 5 minutes.
I found which IN queries I had to change to solve my cache problem.
A detail is: I am using Hibernate 4.2.18 and I don't know if will be useful with other versions.
import java.lang.reflect.Field;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
import java.util.Set;
import javax.persistence.EntityManager;
import javax.persistence.PersistenceContext;
import org.hibernate.ejb.HibernateEntityManagerFactory;
import org.hibernate.internal.SessionFactoryImpl;
import org.hibernate.internal.util.collections.BoundedConcurrentHashMap;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import com.dao.GenericDAO;
public class CacheMonitor {
private final Logger logger = LoggerFactory.getLogger(getClass());
#PersistenceContext(unitName = "MyPU")
private void setEntityManager(EntityManager entityManager) {
HibernateEntityManagerFactory hemf = (HibernateEntityManagerFactory) entityManager.getEntityManagerFactory();
sessionFactory = (SessionFactoryImpl) hemf.getSessionFactory();
fillQueryMaps();
}
private SessionFactoryImpl sessionFactory;
private BoundedConcurrentHashMap queryPlanCache;
private BoundedConcurrentHashMap parameterMetadataCache;
/*
* I tried to use a MAP and use compare compareToIgnoreCase.
* But remember this is causing memory leak. Doing this
* you will explode the memory faster that it already was.
*/
public void log() {
if (!logger.isDebugEnabled()) {
return;
}
if (queryPlanCache != null) {
long cacheSize = queryPlanCache.size();
logger.debug(String.format("QueryPlanCache size is :%s ", Long.toString(cacheSize)));
for (Object key : queryPlanCache.keySet()) {
int filterKeysSize = 0;
// QueryPlanCache.HQLQueryPlanKey (Inner Class)
Object queryValue = getValueByField(key, "query", false);
if (queryValue == null) {
// NativeSQLQuerySpecification
queryValue = getValueByField(key, "queryString");
filterKeysSize = ((Set) getValueByField(key, "querySpaces")).size();
if (queryValue != null) {
writeLog(queryValue, filterKeysSize, false);
}
} else {
filterKeysSize = ((Set) getValueByField(key, "filterKeys")).size();
writeLog(queryValue, filterKeysSize, true);
}
}
}
if (parameterMetadataCache != null) {
long cacheSize = parameterMetadataCache.size();
logger.debug(String.format("ParameterMetadataCache size is :%s ", Long.toString(cacheSize)));
for (Object key : parameterMetadataCache.keySet()) {
logger.debug("Query:{}", key);
}
}
}
private void writeLog(Object query, Integer size, boolean b) {
if (query == null || query.toString().trim().isEmpty()) {
return;
}
StringBuilder builder = new StringBuilder();
builder.append(b == true ? "JPQL " : "NATIVE ");
builder.append("filterKeysSize").append(":").append(size);
builder.append("\n").append(query).append("\n");
logger.debug(builder.toString());
}
private void fillQueryMaps() {
Field queryPlanCacheSessionField = null;
Field queryPlanCacheField = null;
Field parameterMetadataCacheField = null;
try {
queryPlanCacheSessionField = searchField(sessionFactory.getClass(), "queryPlanCache");
queryPlanCacheSessionField.setAccessible(true);
queryPlanCacheField = searchField(queryPlanCacheSessionField.get(sessionFactory).getClass(), "queryPlanCache");
queryPlanCacheField.setAccessible(true);
parameterMetadataCacheField = searchField(queryPlanCacheSessionField.get(sessionFactory).getClass(), "parameterMetadataCache");
parameterMetadataCacheField.setAccessible(true);
queryPlanCache = (BoundedConcurrentHashMap) queryPlanCacheField.get(queryPlanCacheSessionField.get(sessionFactory));
parameterMetadataCache = (BoundedConcurrentHashMap) parameterMetadataCacheField.get(queryPlanCacheSessionField.get(sessionFactory));
} catch (Exception e) {
logger.error("Failed fillQueryMaps", e);
} finally {
queryPlanCacheSessionField.setAccessible(false);
queryPlanCacheField.setAccessible(false);
parameterMetadataCacheField.setAccessible(false);
}
}
private <T> T getValueByField(Object toBeSearched, String fieldName) {
return getValueByField(toBeSearched, fieldName, true);
}
#SuppressWarnings("unchecked")
private <T> T getValueByField(Object toBeSearched, String fieldName, boolean logErro) {
Boolean accessible = null;
Field f = null;
try {
f = searchField(toBeSearched.getClass(), fieldName, logErro);
accessible = f.isAccessible();
f.setAccessible(true);
return (T) f.get(toBeSearched);
} catch (Exception e) {
if (logErro) {
logger.error("Field: {} error trying to get for: {}", fieldName, toBeSearched.getClass().getName());
}
return null;
} finally {
if (accessible != null) {
f.setAccessible(accessible);
}
}
}
private Field searchField(Class<?> type, String fieldName) {
return searchField(type, fieldName, true);
}
private Field searchField(Class<?> type, String fieldName, boolean log) {
List<Field> fields = new ArrayList<Field>();
for (Class<?> c = type; c != null; c = c.getSuperclass()) {
fields.addAll(Arrays.asList(c.getDeclaredFields()));
for (Field f : c.getDeclaredFields()) {
if (fieldName.equals(f.getName())) {
return f;
}
}
}
if (log) {
logger.warn("Field: {} not found for type: {}", fieldName, type.getName());
}
return null;
}
}
We also had a QueryPlanCache with growing heap usage. We had IN-queries which we rewrote, and additionally we have queries which use custom types. Turned out that the Hibernate class CustomType didn't properly implement equals and hashCode thereby creating a new key for every query instance. This is now solved in Hibernate 5.3.
See https://hibernate.atlassian.net/browse/HHH-12463.
You still need to properly implement equals/hashCode in your userTypes to make it work properly.
We had faced this issue with query plan cache growing too fast and old gen heap was also growing along with it as gc was unable to collect it.The culprit was JPA query taking some more than 200000 ids in the IN clause. To optimise the query we used joins instead of fetching ids from one table and passing those in other table select query..

Using "Any" or "Contains" when context not saved yet

Why isn't the exception triggered? Linq's "Any()" is not considering the new entries?
MyContext db = new MyContext();
foreach (string email in {"asdf#gmail.com", "asdf#gmail.com"})
{
Person person = new Person();
person.Email = email;
if (db.Persons.Any(p => p.Email.Equals(email))
{
throw new Exception("Email already used!");
}
db.Persons.Add(person);
}
db.SaveChanges()
Shouldn't the exception be triggered on the second iteration?
The previous code is adapted for the question, but the real scenario is the following:
I receive an excel of persons and I iterate over it adding every row as a person to db.Persons, checking their emails aren't already used in the db. The problem is when there are repeated emails in the worksheet itself (two rows with the same email)
Yes - queries (by design) are only computed against the data source. If you want to query in-memory items you can also query the Local store:
if (db.Persons.Any(p => p.Email.Equals(email) ||
db.Persons.Local.Any(p => p.Email.Equals(email) )
However - since YOU are in control of what's added to the store wouldn't it make sense to check for duplicates in your code instead of in EF? Or is this just a contrived example?
Also, throwing an exception for an already existing item seems like a poor design as well - exceptions can be expensive, and if the client does not know to catch them (and in this case compare the message of the exception) they can cause the entire program to terminate unexpectedly.
A call to db.Persons will always trigger a database query, but those new Persons are not yet persisted to the database.
I imagine if you look at the data in debug, you'll see that the new person isn't there on the second iteration. If you were to set MyContext db = new MyContext() again, it would be, but you wouldn't do that in a real situation.
What is the actual use case you need to solve? This example doesn't seem like it would happen in a real situation.
If you're comparing against the db, your code should work. If you need to prevent dups being entered, it should happen elsewhere - on the client or checking the C# collection before you start writing it to the db.

Resources