We are about to build a SaaS application. Now we are in the phase of deciding the technology stack. We have developed earlier applications with spring boot and hibernate. So our team currently thinking to use the same stack for the new product.
But here are our concerns.
The applications we built earlier were all client based applications with not so heavy traffic. But the application we are planning to build is a cloud based product. The expected traffic will be very high.
It will be a multi tenancy application. Based on the growth we may need to expand the resources horizontally. As we are planning to use cloud infra we should have the control to optimize the queries to the deep extend.
We should have option to implement second level cache in deep.
We can't let the framework fire queries on its own. We should have complete control on it. (Ex. Child objects will gets loaded automatically while accessing it in hibernate)
With all these points in mind, will Hibernate serve the purpose? Or later once the product grows will it be very challenge to enhance or customize? Or is there any other frameworks there for high traffic scaling? Or can we proceed writing the entire layer on our own?
Any suggestions?
Sure, Hibernate can be used for such scenarios. If you want to be in control of the queries, you should be using a DTO approach though to avoid lazy loading.
Coupled with a query builder like Blaze-Persistence provides, you can also make use of the more advanced features of your database. You could then use Blaze-Persistence Entity Views for DTOs, which is a library I created to allow easy mapping between JPA models and custom interface or abstract class defined models, something like Spring Data Projections on steroids. The idea is that you define your target structure(domain model) the way you like and map attributes(getters) via JPQL expressions to the entity model.
A sample DTO model could look like the following with Blaze-Persistence Entity-Views:
#EntityView(User.class)
public interface UserDto {
#IdMapping
Long getId();
String getName();
Set<RoleDto> getRoles();
#EntityView(Role.class)
interface RoleDto {
#IdMapping
Long getId();
String getName();
}
}
Querying is a matter of applying the entity view to a query, the simplest being just a query by id.
UserDto a = entityViewManager.find(entityManager, UserDto.class, id);
The Spring Data integration allows you to use it almost like Spring Data Projections: https://persistence.blazebit.com/documentation/entity-view/manual/en_US/index.html#spring-data-features
Page<UserDto> findAll(Pageable pageable);
The best part is, it will only fetch the state that is actually necessary!
Related
There are lot of articles why not to use OSIV in production. Unfortunately, my app is finished and I have used open-in-view:true all the time of development because that's default setting and I did not know this. Please, could you give me an advice how to convert the whole application by the easiest way?
Should I use
#PersistenceContext
private EntityManager em;
in every controller and call the NativeQuery?
Or do you have some example of spring boot application without OSIV? Thank you
That's no easy task. If you rely on lazy loading outside of the data layer, you will have to rework the data layer to fit all those needs. The easiest way is to use #EntityGraph on your repository methods to do the fetching of associations. Sometimes you will have to duplicate methods for different use cases to apply different #EntityGraph annotations. There are still some issues you can run into in such a design, but this should get you pretty far already.
IMO the best solution is to use DTOs as this will improve the performance when done right and eliminates all lazy loading issues. The "problem" is though, that this approach might require quite a few changes in your application.
Either way, I would recommend you take a look at what Blaze-Persistence Entity Views has to offer as a way to implement a DTO approach.
I created the library to allow easy mapping between JPA models and custom interface or abstract class defined models, something like Spring Data Projections on steroids. The idea is that you define your target structure(domain model) the way you like and map attributes(getters) via JPQL expressions to the entity model.
A DTO model could look like the following with Blaze-Persistence Entity-Views:
#EntityView(User.class)
public interface UserDto {
#IdMapping
Long getId();
String getName();
Set<RoleDto> getRoles();
#EntityView(Role.class)
interface RoleDto {
#IdMapping
Long getId();
String getName();
}
}
Querying is a matter of applying the entity view to a query, the simplest being just a query by id.
UserDto a = entityViewManager.find(entityManager, UserDto.class, id);
The Spring Data integration allows you to use it almost like Spring Data Projections: https://persistence.blazebit.com/documentation/entity-view/manual/en_US/index.html#spring-data-features
UserDto findOne(Long id);
Using Entity-Views will improve performance immediately as it will only fetch the state that is necessary.
I am working on spring boot application with CRUD api with input and output as json object. Is it okay to include json->POJO and POJO->json logic in service method? (service method is marked with transactional annotation)
//Controller
public Map<String, String> getPersonNames(){
return personSvc.getNames();
}
//Service method
#Transactional(readonly = true)
public Map<String, String> getNames(){
return populateNames(repo.findAll());
}
private Map<String, String> populateNames(final List<Person> personList) {
return ImmutableMap.of(
//Populate names into map
);
}
Well, it mostly depends on application you are building.
Based on the information you provided (almost no information) I can only speak in general, but there is a Domain Driven Design (DDD), which is quite common for Spring applications. You can find more info in answers to this question
This kind of design separates your core domain logic from logic that your technological stack forces you to have. Briefly speaking, it keeps domain models (object that you work with) in the depth of your application.
Next, it wraps the core with application layer (where the logic, that relies on domain models, lays). Application layer only knows how to process underlying models.
And the last wrapper is (port) adapter layer. It adapts your logic for specific technology. It can be, for example, external API or wrapper for MongoDB (while application layer declares only an interface for collecting documents, this layer adapts (implements) it for concrete technology). It can also provide a marshalling/demarshalling.
Maybe example can explain it better:
Domain model is a document (an article) that your service works with.
Application layer knows how to process them (collect, order, filter articles), but knows nothing about JSON serialization.
Resource (aka port adapter) knows how to serialize collection of articles into JSON and back, but it's the only thing it does. No logic here
And you may have seen how every layer knows only about it's underlying layers. An article does not know anything, it's just a model. Application knows how to process articles. And adapter knows how to adapt processing results for concrete technology, JSON for instance.
So i would suggest you to provide basic validation (not against domain/application layer logic) and marshalling/demarshalling process at the highest level, at the resource (your #RestController's endpoints, for instance) since JSON is just a way to adapt your domain for external connections
Yesterday I've got access to the new project in my company and I have found this
public List<User> findNotActiveUsers() {
return this.userRepository.findAll().splititerator()
.filter(u -> u.isActive())
.collect(Collect.toList());
}
Is this a good way to find all the active users? Or should it be done in a repository like this?
public interface UserRepository extends JpaRepository<Long, User> {
#Query("SELECT user FROM User user WHERE user.active IS TRUE")
List<User> findActiveUsers();
}
And If first solution is correct what about performance?
Firstly, both options fulfill the requirement.
However, the option 2 makes more sense to filter the data at query level rather than at Java level. I believe the performance would be better on the second option though I don't have any data to backup this statement. I have commented about the performance based on my experience.
You can also consider whether Cache (#Cacheable) can be used. It purely depends on the use case i.e. how frequently the User entity is changed and how frequently you would like to refresh the cache.
One disadvantage of using native query is that currently Spring JPA doesn't support execution of dynamic sorting for native queries.
Please refer the similar question discussed in the below link though it is very much related to Hibernate. Clearly, the option 3 is preferred (i.e. #Query approach).
Spring Data Repository with ORM, EntityManager, #Query, what is the most elegant way to deal with custom SQL queries?
I have a large table that I'd like to access via a Spring Data Repository.
Currently, I'm trying to extend the PagingAndSortingRepository interface but it seems I can only define methods that return lists, eg.:
public interface MyRepository extends
PagingAndSortingRepository<MyEntity, Integer>
{
#Query(value="SELECT * ...")
List<MyEntity> myQuery(Pageable p);
}
On the other hand, the findAll() method that comes with PagingAndSortingRepository returns an Iterable (and I suppose that the data is not loaded into memory).
Is it possible to define custom queries that also return Iterable and/or don't load all the data into memory at once?
Are there any alternatives for handling large tables?
We have the classical consulting answer here: it depends. As the implementation of the method is store specific, we depend on the underlying store API. In case of JPA there's no chance to provide streaming access as ….getResultList() returns a List. Hence we also expose the List to the client as especially JPA developers might be used to working with lists. So for JPA the only option is using the pagination API.
For a store like Neo4j we support the streaming access as the repositories return Iterable on CRUD methods as well as on the execution of finder methods.
The implementation of findAll() simply loads the entire list of all entities into memory. Its Iterable return type doesn't imply that it implements some sort of database level cursor handling.
On the other hand your custom myQuery(Pageable) method will only load one page worth of entities, because the generated implementation honours its Pageable parameter. You can declare its return type either as Page or List. In the latter case you still receive the same (restricted) number of entities, but not the metadata that a Page would additionally carry.
So you basically did the right thing to avoid loading all entities into memory in your custom query.
Please review the related documentation here.
I think what you are looking for is Spring Data JPA Stream. It brings a significant performance boost to data fetching particularly in databases with millions of record. In your case you have several options which you can consider
Pull all data once in memory
Use pagination and read pages each time
Use something like Apache Spark
Streaming data using Spring Data JPA
In order to make Spring Data JPA Stream to work, we need to modify our MyRepository to return Stream<MyEntity> like this:
public interface MyRepository extends PagingAndSortingRepository<MyEntity, Integer> {
#QueryHints(value = {
#QueryHint(name = HINT_CACHEABLE, value = "false"),
#QueryHint(name = READ_ONLY, value = "true")
})
#Query(value="SELECT * ...")
Stream<MyEntity> myQuery();
}
In this example, we disable second level caching and hint Hibernate that the entities will be read only. If your requirement is different, make sure to change those settings accordingly for your requirements.
I just switched from ActiveRecord/NHibernate to Dapper. Previously, I had all of my queries in my controllers. However, some properties which were convenient to implement on my models (such as summaries/sums/totals/averages), I could calculate by iterating over instance variables (collections) in my model.
To be specific, my Project has a notion of AppSessions, and I can calculate the total number of sessions, plus the average session length, by iterating over someProject.AppSessions.
Now that I'm in Dapper, this seems confused: my controller methods now make queries to the database via Dapper (which seems okay), but my model class also makes queries to the database via Dapper (which seems strange).
TLDR: Should the DB access go in my model, or controller, or both? It seems that both is not correct, and I would like to limit it to one "layer" so that changing DB access style later doesn't impact too much.
You should consider using a repository pattern:
With repositories, all of the database queries are encapsulated within a repository which is exposed through public interface, for example:
public interface IGenericRepository<T> where T : class
{
T Get(object id);
IQueryable<T> GetAll();
void Insert(T entity);
void Delete(T entity);
void Save(T entity);
}
Then you can inject a repository into a controller:
public class MyController
{
private readonly IGenericRepository<Foo> _fooRepository;
public MyController(IGenericRepository<Foo> fooRepository)
{
_fooRepository = fooRepository;
}
}
This keeps UI free of any DB dependencies and makes testing easier; from unit tests you can inject any mock that implements IRepository. This also allows the repository to implement and switch between technologies like Dapper or Entity Framework without any client changes and at any time.
The above example used a generic repository, but you don't have to; you can create a separate interface for each repository, e.g. IFooRepository.
There are many examples and many variations of how repository pattern can be implemented, so google some more to understand it. Here is one of my favorite articles re. layered architectures.
Another note: For small projects, it should be OK to put queries directly into controllers...
I can't speak for dapper personally, but I've always restricted my db access to models only except in very rare circumstances. That seems to make the most sense in my opinion.
A little more info: http://en.wikipedia.org/wiki/Model%E2%80%93view%E2%80%93controller
A model notifies its associated views and controllers when there has been a change in its state. This notification allows the views to produce updated output, and the controllers to change the available set of commands. A passive implementation of MVC omits these notifications, because the application does not require them or the software platform does not support them.
Basically, data access in models seems to be the standard.
I agree with #void-ray regarding the repository model. However, if you don't want to get into interfaces and dependency injection you could still separate out your data access layer and use static methods to return data from Dapper.
When I am using Dapper I typically have a Repository library that returns very small objects or lists that can then be mapped into a ViewModel and passed to the View (the mapping is done by StructureMap, but could be handled in the controller or another helper).