Yesterday I've got access to the new project in my company and I have found this
public List<User> findNotActiveUsers() {
return this.userRepository.findAll().splititerator()
.filter(u -> u.isActive())
.collect(Collect.toList());
}
Is this a good way to find all the active users? Or should it be done in a repository like this?
public interface UserRepository extends JpaRepository<Long, User> {
#Query("SELECT user FROM User user WHERE user.active IS TRUE")
List<User> findActiveUsers();
}
And If first solution is correct what about performance?
Firstly, both options fulfill the requirement.
However, the option 2 makes more sense to filter the data at query level rather than at Java level. I believe the performance would be better on the second option though I don't have any data to backup this statement. I have commented about the performance based on my experience.
You can also consider whether Cache (#Cacheable) can be used. It purely depends on the use case i.e. how frequently the User entity is changed and how frequently you would like to refresh the cache.
One disadvantage of using native query is that currently Spring JPA doesn't support execution of dynamic sorting for native queries.
Please refer the similar question discussed in the below link though it is very much related to Hibernate. Clearly, the option 3 is preferred (i.e. #Query approach).
Spring Data Repository with ORM, EntityManager, #Query, what is the most elegant way to deal with custom SQL queries?
Related
I'm new to Spring Boot and I just started using graphql-spqr for Spring Boot since it allows for easy bootstrapping of Java projects.
However, as per my understanding, GraphQL basically allows the fetching of selected fields from the database. As per the examples, I've seen, this type of selection in the graphql-spqr library happens on the client side. Is there a way to do selection both client-side and server-side so as to speed up the queries?
I've looked into EntityGraph examples for GraphQL but they are mostly implemented for complex queries that involve JOINs. However, nothing exists for simple queries like findAll(), findById() etc.
I would like to use findAll() with the server fetching only the fields as requested by the client. How can I do that?
What was said in the comments is correct: GraphQL (and hence SPQR, as it's merely a tool to hook the schema up) does not know anything about SQL, databases, JOINs or anything else. It's a communication protocol, the rest is up to you.
As for your situation, you'd have to inject the subselection into the resolver and pass it down to SQL. In the simplest case, it can look like this (in pseudo code):
public List<Book> books(#GraphQLEnvironment Set<String> fields) {
//pass the requested field names further
return database.query("SELECT " + fields + " FROM book");
}
You can inject ResolutionEnvironment using the same annotation, in case you need the full context.
We are about to build a SaaS application. Now we are in the phase of deciding the technology stack. We have developed earlier applications with spring boot and hibernate. So our team currently thinking to use the same stack for the new product.
But here are our concerns.
The applications we built earlier were all client based applications with not so heavy traffic. But the application we are planning to build is a cloud based product. The expected traffic will be very high.
It will be a multi tenancy application. Based on the growth we may need to expand the resources horizontally. As we are planning to use cloud infra we should have the control to optimize the queries to the deep extend.
We should have option to implement second level cache in deep.
We can't let the framework fire queries on its own. We should have complete control on it. (Ex. Child objects will gets loaded automatically while accessing it in hibernate)
With all these points in mind, will Hibernate serve the purpose? Or later once the product grows will it be very challenge to enhance or customize? Or is there any other frameworks there for high traffic scaling? Or can we proceed writing the entire layer on our own?
Any suggestions?
Sure, Hibernate can be used for such scenarios. If you want to be in control of the queries, you should be using a DTO approach though to avoid lazy loading.
Coupled with a query builder like Blaze-Persistence provides, you can also make use of the more advanced features of your database. You could then use Blaze-Persistence Entity Views for DTOs, which is a library I created to allow easy mapping between JPA models and custom interface or abstract class defined models, something like Spring Data Projections on steroids. The idea is that you define your target structure(domain model) the way you like and map attributes(getters) via JPQL expressions to the entity model.
A sample DTO model could look like the following with Blaze-Persistence Entity-Views:
#EntityView(User.class)
public interface UserDto {
#IdMapping
Long getId();
String getName();
Set<RoleDto> getRoles();
#EntityView(Role.class)
interface RoleDto {
#IdMapping
Long getId();
String getName();
}
}
Querying is a matter of applying the entity view to a query, the simplest being just a query by id.
UserDto a = entityViewManager.find(entityManager, UserDto.class, id);
The Spring Data integration allows you to use it almost like Spring Data Projections: https://persistence.blazebit.com/documentation/entity-view/manual/en_US/index.html#spring-data-features
Page<UserDto> findAll(Pageable pageable);
The best part is, it will only fetch the state that is actually necessary!
I'm using SpringBoot 2.3.1 and Spring Data for accessing to PostgreSQL. I have the following simple controller:
#RestController
public class OrgsApiImpl implements OrgsApi {
#Autowired
Orgs repository;
#Override
public ResponseEntity<List<OrgEntity>> listOrgs(#Valid Optional<Integer> pageLimit,
#Valid Optional<String> pageCursor, #Valid Optional<List<String>> domainId,
#Valid Optional<List<String>> userId) {
List<OrgEntity> orgs;
if (domainId.isPresent() && userId.isPresent()) {
orgs = repository.findAllByDomainIdInAndUserIdIn(domainId.get(), userId.get());
} else if (domainId.isPresent) {
orgs = repository.findAllByDomainIdIn(domainId.get());
} else if (userId.isPresent()) {
orgs = repository.findAllByUserIdIn(userId.get());
} else {
orgs = findAll();
}
return ResponseEntity.ok(orgs);
}
}
And a simple JPA repository:
public interface Orgs extends JpaRepository<OrgEntity, String> {
List<OrgEntity> findAllByDomainIdIn(List<String> domainIds);
List<OrgEntity> findAllByUserIdIn(List<String> userIds);
List<OrgEntity> findAllByDomainIdInAndUserIdIn(List<String> domainIds, List<String> userIds);
}
The code above has several obvious issues:
If number of query parameters will grow, then this if is growing very fast and too hard to maintain it. Question: Is there any way to build query with dynamic number of parameters?
This code doesn't contain a mechanism to support cursor. Question: Is there any tool in Spring Data to support query based on cursor?
The second question can be easily get read if first question is answered.
Thank you in advance!
tl;dr
It's all in the reference documentation.
Details
Spring Data modules pretty broadly support Querydsl to build dynamic queries as documented in the reference documentation. For Spring Data JPA in particular, there's also support for Specifications on top of the JPA Criteria API. For simple permutations, query by example might be an option, too.
As for the second question, Spring Data repositories support streaming over results. That said, assuming you'd like to do this for performance reasons, JPA might not be the best fit in the first place, as it'll still keep processed items around due to its entity lifecycle model. If it's just about access subsets of the results page by page or slice by slice, that's supported, too.
For even more efficient streaming over large data sets, it's advisable to resort to plain SQL either via jOOQ (which can be used with any Spring Data module supporting relational databases), Spring Data JDBC or even Spring Data R2DBC if reactive programming is an option.
You can use spring-dynamic-jpa library to write a query template
The query template will be built into different query strings before execution depending on your parameters when you invoke the method.
I have a large table that I'd like to access via a Spring Data Repository.
Currently, I'm trying to extend the PagingAndSortingRepository interface but it seems I can only define methods that return lists, eg.:
public interface MyRepository extends
PagingAndSortingRepository<MyEntity, Integer>
{
#Query(value="SELECT * ...")
List<MyEntity> myQuery(Pageable p);
}
On the other hand, the findAll() method that comes with PagingAndSortingRepository returns an Iterable (and I suppose that the data is not loaded into memory).
Is it possible to define custom queries that also return Iterable and/or don't load all the data into memory at once?
Are there any alternatives for handling large tables?
We have the classical consulting answer here: it depends. As the implementation of the method is store specific, we depend on the underlying store API. In case of JPA there's no chance to provide streaming access as ….getResultList() returns a List. Hence we also expose the List to the client as especially JPA developers might be used to working with lists. So for JPA the only option is using the pagination API.
For a store like Neo4j we support the streaming access as the repositories return Iterable on CRUD methods as well as on the execution of finder methods.
The implementation of findAll() simply loads the entire list of all entities into memory. Its Iterable return type doesn't imply that it implements some sort of database level cursor handling.
On the other hand your custom myQuery(Pageable) method will only load one page worth of entities, because the generated implementation honours its Pageable parameter. You can declare its return type either as Page or List. In the latter case you still receive the same (restricted) number of entities, but not the metadata that a Page would additionally carry.
So you basically did the right thing to avoid loading all entities into memory in your custom query.
Please review the related documentation here.
I think what you are looking for is Spring Data JPA Stream. It brings a significant performance boost to data fetching particularly in databases with millions of record. In your case you have several options which you can consider
Pull all data once in memory
Use pagination and read pages each time
Use something like Apache Spark
Streaming data using Spring Data JPA
In order to make Spring Data JPA Stream to work, we need to modify our MyRepository to return Stream<MyEntity> like this:
public interface MyRepository extends PagingAndSortingRepository<MyEntity, Integer> {
#QueryHints(value = {
#QueryHint(name = HINT_CACHEABLE, value = "false"),
#QueryHint(name = READ_ONLY, value = "true")
})
#Query(value="SELECT * ...")
Stream<MyEntity> myQuery();
}
In this example, we disable second level caching and hint Hibernate that the entities will be read only. If your requirement is different, make sure to change those settings accordingly for your requirements.
I happen to find examples that uses this construct though I am not sure what can I get from this?
Does it means that all select statements in a stateless EJB should follow this?
#Stateless
public class EmployeeFacade {
#PersistenceContext(unitName="EmployeeService")
EntityManager em;
#TransactionAttribute(TransactionAttributeType.NOT_SUPPORTED)
public List<Department> findAllEmployees() {
return em.createQuery("SELECT e FROM Employee e",
Employee.class)
.getResultList();
}
What do I get from this?
Thanks.
What you get is:
Relatively formal way to tell that your method does not need transaction (as consequence you know for example that it will not call persist, merge or remove in EntityManager).
Possible performance optimization in some cases.
No need to create/pass transaction. According Java EE 5 Tutorial: "Because transactions involve overhead, this attribute may improve performance."
According other sources (for example Pro JPA 2) it offers implementations possibility to not create managed entities at all (which is likely heavier operation than creating detached entities right away).