Spring Data JPA projection performance - spring

This projection:
public interface IDate {
UUID getId();
Long getLatestTime();
default DateTime getLatestDate() {
Long maximumTimeLastModified = getLatestTime();
Date maxDate = new Date(maximumTimeLastModified.longValue());
return new DateTime(maxDate);
}
}
was created and added to the JPA Repository:
List<IDate> findLatestDates(Set<UUID> ids);
Functionally this works perfectly and is very clean. However, the performance was slow - it took nearly twice as long as simply returning List<Object[]>. (And processing those results in Java). Specifically, a web request took 12 seconds to complete with the projection use, but only 7 seconds without it. Does anyone know why and if there is a way to improve? In general, are there known performance impacts of using Projections that all should be aware of?

Related

Batch update operation using JDBC Template

I have a service where I have to update multiple rows. I was testing for a batch of 2000 rows. Using CrudRepository's saveAll() the update operation was taking 211 seconds.
After looking around for jdbc template I came across this implementation of it: https://mkyong.com/spring/spring-jdbctemplate-batchupdate-example/
My Implementation of it:
#Transactional
public int[][] batchUpdateBseStatus(List<ExchangeTradeStatus> users, int batchSize) {
int[][] updateCounts = jdbcTemplate.batchUpdate(
"update exchange_trade_status set bse_status = ? where id = ?",
users,
batchSize,
new ParameterizedPreparedStatementSetter<ExchangeTradeStatus>() {
public void setValues(PreparedStatement ps, ExchangeTradeStatus user)
throws SQLException {
ps.setString(1, user.getBseStatus().name());
ps.setInt(2, user.getId());
}
});
return updateCounts;
}
For the same update process it's now taking about 105 seconds. Reading more about implementing jdbc batch update I saw a similar implementation to mine who had published this performance:
My time is pretty slow compared to this. Is there any fundamental flaw in my understanding and final implementation of batchUpdate function and how can I improve my time?
Update:
I used these two properties and it gave me an update time of 1.297 seconds for 1970 rows
spring.datasource.hikari.data-source-properties.useConfigs=maxPerformance
spring.datasource.hikari.data-source-properties.rewriteBatchedStatements=true

Benchmarking spring data vs JDBI in select from postgres Database

I wanted to compare the performence for Spring data vs JDBI
I used the following versions
Spring Boot 2.2.4.RELEASE
vs
JDBI 3.13.0
the test is fairly simple select * from admin table and convert to a list of Admin object
here is the relevant details
with spring boot
public interface AdminService extends JpaRepository<Admin, Integer> {
}
and for JDBI
public List<Admin> getAdmins() {
String sql = "Select admin_id as adminId, username from admins";
Handle handle = null;
try {
handle = Sql2oConnection.getInstance().getJdbi().open();
return handle.createQuery(sql).mapToBean(Admin.class).list();
}catch(Exception ex) {
log.error("Could not select admins from admins: {}", ex.getMessage(), ex );
return null;
} finally {
handle.close();
}
}
the test class is executed using junit 5
#Test
#DisplayName("How long does it take to run 1000 queries")
public void loadAdminTable() {
System.out.println("Running load test");
Instant start = Instant.now();
for(int i= 0;i<1000;i++) {
adminService.getAdmins(); // for spring its findAll()
for(Admin admin: admins) {
if(admin.getAdminId() == 654) {
System.out.println("just to simulate work with the data");
}
}
}
Instant end = Instant.now();
Duration duration = Duration.between(start, end);
System.out.println("Total duration: " + duration.getSeconds());
}
i was quite shocked to get the following results
Spring Data: 2 seconds
JDBI: 59 seconds
any idea why i got these results? i was expecting JDBI to be faster
The issue was that spring manages the connection life cycle for us and for a good reason
after reading the docs of JDBI
There is a performance penalty every time a connection is allocated
and released. In the example above, the two insertFullContact
operations take separate Connection objects from your database
connection pool.
i changed the test code of the JDBI test to the following
#Test
#DisplayName("How long does it take to run 1000 queries")
public void loadAdminTable() {
System.out.println("Running load test");
String sql = "Select admin_id as adminId, username from admins";
Handle handle = null;
handle = Sql2oConnection.getInstance().getJdbi().open();
Instant start = Instant.now();
for(int i= 0;i<1000;i++) {
List<Admin> admins = handle.createQuery(sql).mapToBean(Admin.class).list();
if(!admins.isEmpty()) {
for(Admin admin: admins) {
System.out.println(admin.getUsername());
}
}
}
handle.close();
Instant end = Instant.now();
Duration duration = Duration.between(start, end);
System.out.println("Total duration: " + duration.getSeconds());
}
this way the connection is opened once and the query runs 1000 times
the final result was 1 second
twice as fast as spring
On the one hand you seem to make some basic mistakes of benchmarking:
You are not warming up the JVM.
You are not using the results in any way.
Therefore what you are seeing might just be effects of different optimisations of the VM.
Look into JMH in order to improve your benchmarks.
Benchmarks with an external resource are extra hard, because you have so many more parameters to control.
One big question is for example if the connection to the database is realistically slow as in most production systems the database will be on a different machine at least virtually, quite possibly on different hardware.
Is that true in your test as well?
Assuming your results are real, the next step is to investigate where the extra time gets spent.
I would expect the most time to be spent with executing the SQL statements and obtaining the result via the network.
Therefore you should inspect what SQL statements actually get executed.
This might point you to one possible answer that JPA is doing lots of lazy loading and hasn't even loaded most of you really need.

Spring data Page/Pageable returns duplicates on large data sets?

When operating on large data sets, Spring Data presents two abstractions: Stream and Page. We've been using Stream for awhile and had no issues, but recently I wanted to try a paginated approach and ran into a reliability issue.
Consider the following:
#Entity
public class MyData {
}
public interface MyDataRepository extends JpaRepository<MyData, UUID> {
}
#Component
public class MyDataService {
private MyDataRepository repository;
// Bridge between a Reactive service and a transactional / non-reactive database call
#Transactional
public void getAllMyData(final FluxSink<MyData> sink) {
final Pageable firstPage = PageRequest.of(0, 500);
Page<MyData> page = repository.findAll(firstPage);
while (page != null && page.hasContent()) {
page.getContent().forEach(sink::next);
if (page.hasNext()) {
page = repository.findAll(page.nextPageable());
}
else {
page = null;
}
}
sink.complete();
}
}
Using two Postgres 9.5 databases, the source database had close to 100,000 rows while the destination was empty. The example code was then used to copy from the source to the destination. At the end I would find that my destination database had far smaller row count than the source.
Run as a springboot app
The flux doing the copy was using 4-6 threads in parallel (for speed)
Total run time of at least an hour (max was 2 hours)
As it turns out, I was eventually processing the same rows multiple times (and missing other rows as a result). This lead me to discovering a fix that others had already ran into, where you should provide a Sort.by("") argument.
After changing the service to use:
// Make our pages sorted by the PKEY
final Pageable firstPage = PageRequest.of(0, 500, Sort.by("id"));
I found that while it GREATLY helped, I would still process multiple rows (from losing about half the rows to only seeing ~12 duplicates). When I use a Stream instead, I have no issues.
Does anyone have any explanation for what is going on? I don't seem to have any duplicates come through until the test has been running for at least 10-15min, which almost leads me to believe that there is some kind of session or other timeout happening (either in the client, or on the database) that causes the hiccups. But I'm really far out of my knowledge area for troubleshooting it further heh.

protobuf-net concurrent performance issue in TakeLock

We're using protobuf-net for sending log messages between services. When profiling stress testing, under high concurrency, we see very high CPU usage and that TakeLock in RuntimeTypeModel is the culprit. The hot call stack looks something like:
*Our code...*
ProtoBuf.Serializer.SerializeWithLengthPrefix(class System.IO.Stream,!!0,valuetype ProtoBuf.PrefixStyle)
ProtoBuf.Serializer.SerializeWithLengthPrefix(class System.IO.Stream,!!0,valuetype ProtoBuf.PrefixStyle,int32)
ProtoBuf.Meta.TypeModel.SerializeWithLengthPrefix(class System.IO.Stream,object,class System.Type,valuetype ProtoBuf.PrefixStyle,int32)
ProtoBuf.Meta.TypeModel.SerializeWithLengthPrefix(class System.IO.Stream,object,class System.Type,valuetype ProtoBuf.PrefixStyle,int32,class ProtoBuf.SerializationContext)
ProtoBuf.ProtoWriter.WriteObject(object,int32,class ProtoBuf.ProtoWriter,valuetype ProtoBuf.PrefixStyle,int32)
ProtoBuf.BclHelpers.WriteNetObject(object,class ProtoBuf.ProtoWriter,int32,valuetype
ProtoBuf.BclHelpers/NetObjectOptions)
ProtoBuf.Meta.TypeModel.GetKey(class System.Type&)
ProtoBuf.Meta.RuntimeTypeModel.GetKey(class System.Type,bool,bool)
ProtoBuf.Meta.RuntimeTypeModel.FindOrAddAuto(class System.Type,bool,bool,bool)
ProtoBuf.Meta.RuntimeTypeModel.TakeLock(int32&)
[clr.dll]
I see that we can use the new precompiler to get a speed boost, but I'm wondering if that will get rid of the issue (sounds like it doesn't use reflection); it would be a bit of work for me to integrate this, so I haven't tested it yet. I also see the option to call Serializer.PrepareSerializer. My initial (small scale) testing didn't make the prepare seem promising.
A little more info about the type we're serializing:
[ProtoContract]
public class SomeMessage
{
[ProtoMember(1)]
public SomeEnumType SomeEnum { get; set; }
[ProtoMember(2)]
public long SomeId{ get; set; }
[ProtoMember(3)]
public string SomeString{ get; set; }
[ProtoMember(4)]
public DateTime SomeDate { get; set; }
[ProtoMember(5, DynamicType = true, OverwriteList = true)]
public Collection<object> SomeArguments
}
Thanks for your help!
UPDATE 9/17
Thanks for your response! We're going to try the workaround you suggest and see if that fixes things.
This code lives in our logging system so, in the SomeMessage example, SomeString is really a format string (e.g. "Hello {0}") and the SomeArguments collection is a list of objects used to fill in the format string, just like String.Format. Before we serialize, we look at each argument and call DynamicSerializer.IsKnownType(argument.GetType()), if it isn't known, we convert it to a string first. I haven't looked at the ratios of data, but I'm pretty sure we have a lot of different strings coming in as arguments.
Let me know if this helps. If you need, I'll try to get more details.
TakeLock is only used when it is changing the model, for example because it is seeing a type for the first time. You shouldn't normally see TakeLock after the first time a particular type has been used. In most cases, using Serializaer.PrepareSerializer<SomeMessage>() should perform all the necessary initialization (and similar for any other contracts you are using).
However! I wonder if perhaps this is also related to your use of DynamicType; what are the actual objects being used here? It might be that I need to tweak the logic here, so that it doesn't spend any time on that step. If you let me know the actual objects (so I can repro), I will try to run some tests.
As for whether the precompiler would change this; yes it would. A fully compiled static model has a completely different implementation of the ProtoBuf.Meta.TypeModel.GetKey method, so it would never call TakeLock (you don't need to protect a model that can never change!). But you can actuallydo something very similar without needing to use precompile. Consider the following, run as part of your app's initialization:
static readonly TypeModel serializer;
...
var model = TypeModel.Create();
model.Add(typeof(SomeMessage), true);
// TODO add other contracts you use here
serializer = model.Compile();
This will create a fully static-compiled serializer assembly in memory (instead of a mutable model with individual operations compiled). If you now use serializer.Serialize(...) instead of Serializer.Serialize (i.e. the instance method on your stored TypeModel rather than the static method on Serializer) then it will essentially be doing something very similar to "precompiler", but without the need to actualy precompile it (obviously this will only be available on "full" .NET). This will then never call TakeLock, as it is running a fixed model, rather than a flexible model. It does, however, require you to know what contract-types you use. You could use reflection to find these, by looking for all those types with a given attribute:
static readonly TypeModel serializer;
...
var model = TypeModel.Create();
Type attributeType = typeof(ProtoContractAttribute);
foreach (var type in typeof(SomeMessage).Assembly.GetTypes()) {
if (Attribute.IsDefined(type, attributeType)) {
model.Add(type, true);
}
}
serializer = model.Compile();
But emphasis: the above is a workaround; it sounds like there's a glitch, which I'll happily investigate if I can see an example where it actually happens; most importantly: what are the objects in SomeArguments?

Can someone help me understand Guava CacheLoader?

I'm new to Google's Guava library and am interested in Guava's Caching package. Currently I have version 10.0.1 downloaded. After reviewing the documentation, the JUnit tests source code and even after searching google extensively, I still can't figure out how to use the Caching package. The documentation is very short, as if it was written for someone who has been using Guava's library not for a newbie like me. I just wish there are more real world examples on how to use Caching package propertly.
Let say I want to build a cache of 10 non expiring items with Least Recently Used (LRU) eviction method. So from the example found in the api, I build my code like the following:
Cache<String, String> mycache = CacheBuilder.newBuilder()
.maximumSize(10)
.build(
new CacheLoader<String, String>() {
public String load(String key) throws Exception {
return something; // ?????
}
});
Since the CacheLoader is required, I have to include it in the build method of CacheBuilder. But I don't know how to return the proper value from mycache.
To add item to mycache, I use the following code:
mycache.asMap().put("key123", "value123");
To get item from mycache, I use this method:
mycache.get("key123")
The get method will always return whatever value I returned from CacheLoader's load method instead of getting the value from mycache. Could someone kindly tell me what I missed?
Guava's Cache type is generally intended to be used as a computing cache. You don't usually add values to it manually. Rather, you tell it how to load the expensive to calculate value for a key by giving it a CacheLoader that contains the necessary code.
A typical example is loading a value from a database or doing an expensive calculation.
private final FooDatabase fooDatabase = ...;
private final LoadingCache<Long, Foo> cache = CacheBuilder.newBuilder()
.maximumSize(10)
.build(new CacheLoader<Long, Foo>() {
public Foo load(Long id) {
return fooDatabase.getFoo(id);
}
});
public Foo getFoo(long id) {
// never need to manually put a Foo in... will be loaded from DB if needed
return cache.getUnchecked(id);
}
Also, I tried the example you gave and mycache.get("key123") returned "value123" as expected.

Resources