Using findOne() / findAll() in spring boot for Cassandra DB - spring-boot

During code optimization I found few areas where I was using findOne() within for loop –
public List<User> validateUsers(List<String> userIds) {
List<User> validUsers = new ArrayList<>();
for ( String userId : userIds) {
User user = userRepository.findOne(userId); //Network hit :: expensive call
//Perform validations
...
//Add valid users to validUsers list
...
}
return validUsers;
}
Above method takes long time if I pass huge list of users to validate. [for 300 users around 5 sec.]
Then I changed above method to use findAll() and perform validations on result collection -
public List<User> validateUsers(List<String> userIds) {
List<User> validUsers = new ArrayList<>();
Iterable<User> itr = userRepository.findAll(userIds); //Only one Network hit
for ( User user : itr) {
//Perform validations
...
//Add valid users to validUsers list
...
}
return validUsers;
}
Now for 300 users, results coming in 100 ms.
Question is: Is there any side effects of using findAll() considering the underlying structure of Cassandra? Also I am using CrudRepository. Should I use CassandraRepository?

Following are the parameters to think of when you are attempting this.
How big is the users table, if you are using findAll.
Partition keys for the user table
As Cassandra queries are faster with the primary key fields, findOne might perform better with the large amount of data.
However, can you try
List<T> findAllById(Iterable<ID> ids);
from org.springframework.data.cassandra.repository.CassandraRepository

Related

Memory leak with Criteria API Pageable

I implemented pageable functionality into Criteria API query and I noticed increased memory usage during query execution. I also used spring-data-jpa method query to return same result, but there memory is cleaned up after every batch is processed. I tried detaching, flushing, clearing objects from EntityManager, but memory use would keep going up, occasionally it will drop but not as much as with method queries. My question is what could cause this memory use if objects are detached and how to deal with it?
Memory usage with Criteria API pageable:
Memory usage with method query:
Code
Since I'm also updating entities retrieved from DB, I use approach where I save ID of last processed entity, so when entity gets updated query doesen't skip next selected page. Below I provide code example that is not from real app I'm working on, but it just recreation of the issue I'm having.
Repository code:
#Override
public Slice<Player> getPlayers(int lastId, Pageable pageable) {
List<Predicate> predicates = new ArrayList<>();
CriteriaBuilder criteriaBuilder = entityManager.getCriteriaBuilder();
CriteriaQuery<Player> criteriaQuery = criteriaBuilder.createQuery(Player.class);
Root<Player> root = criteriaQuery.from(Player.class);
predicates.add(criteriaBuilder.greaterThan(root.get("id"), lastId));
criteriaQuery.where(criteriaBuilder.and(predicates.toArray(Predicate[]::new)));
criteriaQuery.orderBy(criteriaBuilder.asc(root.get("id")));
var query = entityManager.createQuery(criteriaQuery);
if (pageable.isPaged()) {
int pageSize = pageable.getPageSize();
int offset = pageable.getPageNumber() > 0 ? pageable.getPageNumber() * pageSize : 0;
// Fetch additional element and skip it based on the pageSize to know hasNext value.
query.setMaxResults(pageSize + 1);
query.setFirstResult(offset);
var resultList = query.getResultList();
boolean hasNext = pageable.isPaged() && resultList.size() > pageSize;
return new SliceImpl<>(hasNext ? resultList.subList(0, pageSize) : resultList, pageable, hasNext);
} else {
return new SliceImpl<>(query.getResultList(), pageable, false);
}
}
Iterating through pageables:
#Override
public Slice<Player> getAllPlayersPageable() {
int lastId = 0;
boolean hasNext = false;
Pageable pageable = PageRequest.of(0, 200);
do {
var players = playerCriteriaRepository.getPlayers(lastId, pageable);
if(!players.isEmpty()){
lastId = players.getContent().get(players.getContent().size() - 1).getId();
for(var player : players){
System.out.println(player.getFirstName());
entityManager.detach(player);
}
}
hasNext = players.hasNext();
} while (hasNext);
return null;
}
I think you are running into a query plan cache issue here that is related to the use of the JPA Criteria API and how numeric values are handled. Hibernate will render all numeric values as literals into an intermediary HQL query string which is then compiled. As you can imagine, every "scroll" to the next page will be a new query string so you gradually fill up the query plan cache.
One possible solution is to use a library like Blaze-Persistence which has a custom JPA Criteria API implementation and a Spring Data integration that will avoid these issues and at the same time improve the performance of your queries due to a better pagination implementation.
All your code would stay the same, you just have to include the integration and configure it as documented in the setup section.

How to fetch list of objects with same phone number

I have an entity for driving_info with lot of fields but one of them is a phone number ( from which was ordered ).
What I am trying to do is to fetch all drives that were ordered from that number. But when I try to pass the int of phoneNumber I get
query did not return a unique result: 5; nested exception is javax.persistence.NonUniqueResultException: query did not return a unique result: 5
org.springframework.dao.IncorrectResultSizeDataAccessException: query did not return a unique result: 5; nested exception is javax.persistence.NonUniqueResultException: query did not return a unique result: 5
I actually want the list of results so that I can get a response of list of all drives that were ordered from that phone number.
My controller method is
#GetMapping("/users/{phone}")
public List<User> getUserByPhone(#PathVariable int phone) {
List<User> users= userService.findByPhone(phone);
if(users == null) {
throw new RuntimeException("User not found with "+phone+" phone number");
}
return users;
}
And my DAO is
#Override
#Transactional
public List<User> findByPhone(int phone) {
Session currentSession = entityManager.unwrap(Session.class);
Query<User> theQuery = currentSession.createQuery("from User where phone=:phone",User.class);
List<User> users = theQuery.getResultList();
return users;
}
Try to correct your query in this way:
List<User> users = currentSession.createQuery(
"select u from User u where u.phone = :phone",
User.class
).setParameter( "phone", phone )
.getResultList();
Please note that as it's stated in the documentation:
Even though HQL does not require the presence of a select_clause, it is generally good practice to include one. For simple queries the intent is clear and so the intended result of the select_clause is easy to infer. But on more complex queries that is not always the case.
It is usually better to explicitly specify intent. Hibernate does not actually enforce that a select_clause be present even when parsing JPQL queries, however, applications interested in JPA portability should take heed of this.
You need to call theQuery.list() instead.

Spring data JDBC query creation with pagination complains IncorrectResultSizeDataAccessException: Incorrect result size

I'm struggling to trying the pagination feature, as described in the reference document.
This is my table schema:
CREATE TABLE cities
(
id int PRIMARY KEY,
name varchar(255),
pref_id int
);
Repository:
public interface CityRepository extends CrudRepository<CityEntity, Integer> {
Page<CityEntity> findAll(Pageable pageable);
// get all cities in the prefecture
Page<CityEntity> findByPrefId(Integer prefId, Pageable pageable);
}
Test code:
Page<CityEntity> allCities = repository.findAll(PageRequest.of(0, 10));
Page<CityEntity> cities = repository.findByPrefId(1, PageRequest.of(0, 10));
findAll works well, but findByPrefId throws the following error:
Incorrect result size: expected 1, actual 10
org.springframework.dao.IncorrectResultSizeDataAccessException: Incorrect result size: expected 1, actual 10
at org.springframework.dao.support.DataAccessUtils.nullableSingleResult(DataAccessUtils.java:100)
at org.springframework.jdbc.core.namedparam.NamedParameterJdbcTemplate.queryForObject(NamedParameterJdbcTemplate.java:237)
at org.springframework.data.jdbc.repository.query.AbstractJdbcQuery.lambda$singleObjectQuery$1(AbstractJdbcQuery.java:115)
at org.springframework.data.jdbc.repository.query.PartTreeJdbcQuery.execute(PartTreeJdbcQuery.java:98)
at org.springframework.data.repository.core.support.QueryExecutorMethodInterceptor$QueryMethodInvoker.invoke(QueryExecutorMethodInterceptor.java:195)
at org.springframework.data.repository.core.support.QueryExecutorMethodInterceptor.doInvoke(QueryExecutorMethodInterceptor.java:152)
at org.springframework.data.repository.core.support.QueryExecutorMethodInterceptor.invoke(QueryExecutorMethodInterceptor.java:130)
...
If I change the method signature into List<CityEntity> findByPrefId(Integer prefId, Pageable pageable), it works.
Am I missing something? I'm using the latest version of spring-data-jdbc (2.0.2.RELEASE).
I don't know about the technicality, but this is what I learned from experience.
In your case, if the total number of cities is lesser than the pageable.getPageSize(), then your repository will return a List<>.
But if total number of cities is bigger than the pageable.getPageSize() then your repository will return a Page<>.
Knowing that, this is what I did to work around it.
Long amount = repository.countByPrefId(prefId);
if(pagination.getPageSize()>amount ) {
List<CityEntity> list = repository.findByPrefId(prefId);
} else {
Page<CityEntity> pages = repository.findByPrefId(person, PageRequest.of(0, 10));
}
This also means that in your repository you'll have two differents methods, one with Pageable as a parameter and one with only PrefId as a parameter.
I believe the accepted answer is referring to Spring Data JPA which does work by returning pages based on a count query derived from the custom query OR manually set via countQuery, no reason for the if/else.
However this flat out does not work in Spring Data JDBC.
https://jira.spring.io/browse/DATAJDBC-554
Workaround provided in link but for reference:
interface FooRepository extends PagingAndSortingRepository<FooEntity, Long> {
List<FooEntity> findAllByBar(String bar, Pageable pageable);
Long countAllByBar(String bar);
}
And then combining those 2 queries like this:
List<FooEntity> fooList = repository.findAllByBar("...", pageable);
Long fooTotalCount = repository.countAllByBar("...");
Page<FooEntity> fooPage = PageableExecutionUtils.getPage(fooList, pageable, () -> fooTotalCount);

GraphQL Java: Using #Batched DataFetcher

I know how to retrieve a bean from a service in a datafetcher:
public class MyDataFetcher implements DataFetcher {
...
#Override
public Object get(DataFetchingEnvironment environment) {
return myService.getData();
}
}
But schemas with nested lists should use a BatchedExecutionStrategy and create batched DataFetchers with get() methods annotated #Batched (see graphql-java doc).
But where do I put my getData() call then?
///// Where to put this code?
List list = myService.getData();
/////
public class MyDataFetcher implements DataFetcher {
#Batched
public Object get(DataFetchingEnvironment environment) {
return list.get(environment.getIndex()); // where to get the index?
}
}
WARNING: The original BatchedExecutionStrategy has been deprecated and will get removed. The current preferred solution is the Data Loader library. Also, the entire execution engine is getting replaced in the future, and the new one will again support batching "natively". You can already use the new engine and the new BatchedExecutionStrategy (both in nextgen packages) but they have limited support for instrumentations. The answer below applies equally to both the legacy and the nextgen execution engine.
Look at it like this. Normal DataFetcherss receive a single object as source (DataFetchingEnvironment#getSource) and return a single object as a result. For example, if you had a query like:
{
user (name: "John") {
company {
revenue
}
}
Your company resolver (fetcher) would get a User object as source, and would be expected to somehow return a Company based on that e.g.
User owner = (User) environment.getSource();
Company company = companyService.findByOwner(owner);
return company;
Now, in the exact same scenario, if your DataFetcher was batched, and you used BatchedExecutionStrategy, instead of receiving a User and returning a Company, you'd receive a List<User> and would return a List<Company> instead.
E.g.
List<User> owners = (List<User>) environment.getSource();
List<Company> companies = companyService.findByOwners(owners);
return companies;
Notice that this means your underlying logic must have a way to fetch multiple things at once, otherwise it wouldn't be batched. So your myService.getData call would need to change, unless it can already fetch data for multiple source object in one go.
Also notice that batched resolution makes sense in nested queries only, as the top level resolver can already fetch a list of object, without the need for batching.

How can I cache a database query with "IN" operator?

I'm using Spring Boot with Spring Cache. I have a method that, given a list of ids, returns a list of Food that match with those ids:
public List<Food> get(List<Integer> ids) {
return "select * from FOOD where FOOD_ID in ids"; // << pseudo-code
}
I want to cache the results by id. Imagine that I do:
List<Food> foods = get(asList(1, 5, 7));
and then:
List<Food> foods = get(asList(1, 5));
I want to Food with id 1 and Food with id 5 to be retrieved from cache. Is it possible?
I know I can do a method like:
#Cacheable(key = "id")
public Food getById(id){
...
}
and iterate the ids list and call it each time, but in that case I don't take advantage of IN SQL operator, right? Thanks.
The key attribute of Cacheable takes a SpEL expression to calculate the cache key. So you should be able to do something like
#Cacheable(key = "#ids.stream().map(b -> Integer.toString(b)).collect(Collectors.joining(",")))
This would require the ids to always be in the same order
https://docs.spring.io/spring/docs/current/spring-framework-reference/html/cache.html#cache-annotations-cacheable-key
A better option would be to create a class to wrap around your ids that would be able to generate the cache key for you, or some kind of utility class function.
Another possible Solution without #Cacheable would be to inject the cache manager into the class like:
#Autowired
private CacheManager cacheManager;
You can then retrieve the food cache from the cache manager by name
Cache cache = cacheManager.getCache('cache name');
then you could adjust your method to take in the list of ids and manually add and get the values from cache
cache.get(id);
cache.put(id, food);
You will most likely still not be able to use the SQL IN clause, but you are at least handling the iteration inside the method and not everywhere this method is called, and leveraging the cache whenever possible.
public List<Food> get(List<Integer> ids) {
List<Food> result = new ArrayList<>();
for(Integer id : ids) {
// Attempt to fetch from cache
Food food = cache.get(id);
if (food == null) {
// Fetch from DB
cache.put(id, food);
}
result.add(food);
}
return result;
}
Relevant Javadocs:
http://docs.spring.io/spring/docs/current/javadoc-api/org/springframework/cache/CacheManager.html
http://docs.spring.io/spring/docs/current/javadoc-api/org/springframework/cache/Cache.html

Resources