Is there a way to have custom SQL query on top of JPA repository to have BULK UPSERTS? - spring-boot

I have a snowflake database and it doesn't support unique constraint enforcement (https://docs.snowflake.com/en/sql-reference/constraints-overview.html).
I'm planning to have a method on JPA repository with a custom SQL query to check for duplicates before inserting to the table.
Entity
#Entity
#Table(name = "STUDENTS")
public class Students {
#Id
#Column(name = "ID", columnDefinition = "serial")
#GenericGenerator(name = "id_generator", strategy = "increment")
#GeneratedValue(generator = "id_generator")
private Long id;
#Column(name = "NAME")
private String studentName;
}
Snowflake create table query
CREATE table STUDENTS(
id int identity(1,1) primary key,
name VARCHAR NOT NULL,
UNIQUE(name)
);
Repository
public interface StudentRepository extends JpaRepository<Students, Long> {
//
#Query(value = "???", nativeQuery = true)
List<Student> bulkUpsertStudents(List<Student> students);
}

You can use a SELECT query to check for duplicate values in the name column before inserting a new record into the table. For example:
#Query(value = "SELECT * FROM STUDENTS WHERE name = :name", nativeQuery = true)
List<Student> findByName(#Param("name") String name);
This method will return a list of Student records with the specified name value. If the list is empty, it means that there are no records with that name value, and you can safely insert a new record with that name value.
List<Student> studentList = new ArrayList<>();
for (Student student : students) {
List<Student> existingStudents = studentRepository.findByName(student.getName());
if (existingStudents.isEmpty()) {
studentsToInsert.add(student);
}
}
studentRepository.bulkUpsertStudents(studentList)
EDIT
If the above solution doesn't work. You can use the MERGE statement to update existing records in the table if the data has changed. For example, if you want to update the name of a Student if it has changed, you can use the following MERGE statement:
#Query(value = "MERGE INTO students t USING (SELECT :name AS name, :newName AS newName) s
ON t.name = s.name
WHEN MATCHED AND t.name <> s.newName THEN UPDATE SET t.name = s.newName
WHEN NOT MATCHED THEN INSERT (name) VALUES (s.name)", nativeQuery = true)
List<Student> bulkUpsertStudents(List<Student> students);
This query will update the name of each Student in the students list if it has changed, and if a conflict occurs, it will not insert a new record. This will ensure that only unique name values are inserted into the table, without having to perform a separate query for each record.

I was able to overcome this using the below approach but need to verify the performance of the queries.
Repository saveAll() method to save all the entities.
Using the custom nativeQuery as below
INSERT OVERWRITE INTO STUDENTS
WITH CTE AS(SELECT ROW_NUMBER() OVER (PARTITION BY NAME ORDER BY ID) AS RNO, ID, NAME FROM STUDENTS)
SELECT ID, NAME FROM CTE WHERE RNO = 1;
Example code :
import static io.vavr.collection.List.ofAll;
import static io.vavr.control.Option.of;
import static java.util.function.Predicate.not;
public Validation<ValidationError, List<Students>> saveAll(List<String> students) {
return of(students)
.filter(not(List::isEmpty))
.map(this::mapToEntities) // maps the list to list of database entities
.map(repository::saveAll) // save all
.toValidation(ERROR_SAVING_STUDENTS) // vavr validation in case of error
.peek(x -> repository.purgeStudents()) // purging to remove duplicates
.toValidation(ERROR_PURGING_STUDENTS);
}
This issue is only due to snowflake's incapability to check uniqueness.

Related

Spring OneToMany - how to limit list of objects to list of one field from that object

I wonder if it's possible to fetch List of some specific field of objects instead of list of whole objects from relation #OneToMany:
#Entity
public class Template
...
private Driver driver;
prvate boolean isIpen;
#OneToMany(
mappedBy = "template"
)
private List<Warehouse> warehouses = new ArrayList<>();
I want to fetch list of Template objects with list of Warehouse.name (List<String>) instead of List<Warehouse>. Is it possible?
My repository:
#QueryHints(value = {
#QueryHint(name = org.hibernate.jpa.QueryHints.HINT_PASS_DISTINCT_THROUGH, value = "false")
})
#Query("SELECT at FROM Template at " +
"WHERE at.driver.id = :companyId " +
"AND at.isOpen = true")
#EntityGraph(attributePaths = {"warehouses"})
List<Template> findAllOpenByCompanyId(Long companyId, Pageable pageable);
I wanto to reduce the number of queries to database
I would try using an #ElementCollection with #CollectionTable instead of the #OneToMany.
So it would turn like this:
#Entity
public class Template
...
private Driver driver;
prvate boolean isIpen;
#ElementCollection
#CollectionTable(
name="the name of the warehouse table",
joinColumns=#JoinColumn(name="warehouse id column")
)
#Column(name="warehouse name column in warehouse table")
private List<String> warehouseNames = new ArrayList<>();
I'm unable to test this at the moment, but hopefully it helps.

Order by #oneToMany field using JPA Specification

Consider the entities below -
#Entity
public class Employee {
#Id
#GeneratedValue
private long id;
private String name;
#OneToMany(mappedBy = "employee", fetch = FetchType.LAZY, cascade = CascadeType.ALL)
private List<Phone> phones; //contains both "active" & "inactive" phones
}
#Entity
public class Phone {
#Id
#GeneratedValue
private long id;
private boolean active;
private String number;
#ManyToOne(fetch = FetchType.LAZY)
private Employee employee;
}
I need to pull all the employees and sort them depending on the count of "active" phones they have.
Please note that the employee can have active as well as inactive phones. So the query I am trying to achieve is
ORDER BY (SELECT
COUNT(phone4_.employee_id)
FROM
phone phone4_
WHERE
employee4_.id = phone4_.employee_id
AND phone4_.active = true
) DESC
I am stuck with specification here because of some reason and below is the code I have used -
List<Order> orders = new ArrayList<>();
orders.add(cb.desc(cb.size(employee.get("phones"))));
cq.orderBy(orders);
When I run the code the query that's get generated is
ORDER BY (SELECT
COUNT(phone4_.employee_id)
FROM
phone phone4_
WHERE
employee4_.id = phone4_.employee_id) DESC
I am unable to add an extra AND condition to the logic. Please suggest
As specified in the Persistence API specification:
4.6.16 Subqueries
Subqueries may be used in the WHERE or HAVING clause.
JPA doesn't support subqueries in the order by clause, nor in the select clause.
Hibernate ORM, though, supports them in the SELECT and WHERE clauses.
So you cannot write that query and being JPA compliant.
This HQL should work though and it's covered by Hibernate ORM:
SELECT e1, (SELECT count(p)
FROM Phone p
WHERE p.active = true AND p.employee = e1) as activeCount
FROM Employee e1
ORDER BY activeCount DESC
Surprisingly, writing this query with criteria doesn't work:
CriteriaBuilder builder = ormSession.getCriteriaBuilder();
CriteriaQuery<Object> criteria = builder.createQuery();
Root<Employee> root = criteria.from( Employee.class );
Subquery<Long> activePhonesQuery = criteria.subquery( Long.class );
Root<Phone> phoneRoot = activePhonesQuery.from( Phone.class );
Subquery<Long> phonesCount = activePhonesQuery
.select( builder.count( phoneRoot ) )
.where( builder.and( builder.isTrue( phoneRoot.get( "active" ) ), builder.equal( phoneRoot.get( "employee" ), root ) ) );
criteria.multiselect( root, phonesCount )
.orderBy( builder.desc( phonesCount ) );
The reason is that, Hibernate ORM tries to expand the subquery in the order by clause instead to refer to an alias. And as I mentioned before, this is not supported.
I think the HQL is the easiest option if you don't want to use native queries.

Hibernate performs update and delete on custom JPQL

I am trying to update the fields of an entity that has a ManyToMany relationship, however, as I just want to update the table fields and ignore the ManyToMany relationship. The relationship is between the Company and UserSystem entities, it was defined in the relationship that company_user_system is the union table of the entities. The problem is that when executing my update in Company, always before my update, Hibernate makes an update in company and the relationship delete in user_system_company and this erases the relationship between Company and UserSystem and I don't understand why these two queries occur if I don't execut.
These are the queries, the first and second are not executed by my code:
Hibernate: update company set active=?, email=?, identification_code=?, trading_name=?, update_on=? where id=?
Hibernate: delete from company_user_system where company_id=?
Hibernate: update company set email=?, phone=?, corporate_name=?, trading_name=?, identification_code=?, email=?, phone2=? where id=?
Hibernate: select company0_.id as id1_0_, company0_.active as active2_0_, company0_.corporate_name as corporat3_0_, company0_.created_on as created_4_0_, company0_.email as email5_0_, company0_.email2 as email6_0_, company0_.identification_code as identifi7_0_, company0_.phone as phone8_0_, company0_.phone2 as phone9_0_, company0_.trading_name as trading10_0_, company0_.update_on as update_11_0_ from company company0_ where company0_.id=?
Following is the update implementation code:
public class CompanyRepositoryImpl implements CompanyRepositoryCustom {
#PersistenceContext
private EntityManager entityManager;
public Company updateCompanyFields(Company company) {
// ... fieldSql implementation omitted
String sql = "UPDATE Company SET "+ fieldsSql +" WHERE id = :id ";
Query query = entityManager.createQuery(sql);
// set the values for the fields
for (Method method : getMethods) {
query.setParameter(lowercaseFirstCharInString(cutGetInMethods(method.getName())), method.invoke(company));
}
// set id
query.setParameter("id", company.getId());
// execute update and search the database to return the updated object
if (query.executeUpdate() == 1) {
query = entityManager.createQuery("SELECT c FROM Company c WHERE c.id = :id");
query.setParameter("id", company.getId());
Company getCompany = (Company) query.getResultList().get(0);
return getCompany;
}
return null;
}
// ... Other methods omitted
}
Repository Code:
#Repository
public interface CompanyRepository extends JpaRepository<Company, Long>, JpaSpecificationExecutor<Company> , CompanyRepositoryCustom {
#Modifying
Company updateCompanyFields(Company company);
}
Company entity code, I just added the attributes that I think may contain something useful to try to solve the problem:
#Entity
#DynamicUpdate
#Table(name = "company")
public class Company implements Serializable {
#CreationTimestamp
#Column(name = "created_on", nullable = false)
private Instant createdOn;
#UpdateTimestamp
#Column(name = "update_on")
private Instant updateOn;
#ManyToMany
#JoinTable(
name = "company_user_system",
joinColumns = #JoinColumn(
name = "company_id", referencedColumnName = "id"
),
inverseJoinColumns = #JoinColumn(
name = "user_system_id", referencedColumnName = "id"
)
)
private Set<UserSystem> userSystems = new HashSet<>();
}
The UserSystem class defines the relationship as follows:
#ManyToMany(mappedBy = "userSystems")
private Set<Company> companies = new HashSet<>();
What may be causing this update and delete before my update?
This happens because you changed somewhere the value(s) of your relationship. EntityManager tracks such changes and marks the entity as dirty. When you execute a custom SQL query Hibernate will perform all the pending queries (submit any dirty entities).
You may prevent it by calling EntityManager.clear().

Spring Data JPA query to select all value from one join table

I have problem to select all value from one table and few other columns using Spring Data JPA. I am using PostgreSql database and when I send query through PgAdmin I get values I want, but if I use it in Spring Boot Rest returns only one table values (subquery not working). What I am doing wrong?
#Query(value = "SELECT item.*, MIN(myBid.bid) AS myBid, (SELECT MIN(lowestBid.bid) AS lowestbid FROM bids lowestBid WHERE lowestBid.item_id = item.item_id GROUP BY lowestBid.item_id) FROM item JOIN bids myBid ON item.item_id = myBid.item_id WHERE myBid.user_id = :user_id GROUP BY item.item_id", nativeQuery = true)
public List<Item> findAllWithDescriptionQuery(#Param("user_id") UUID userId);
Added Item class
#Data
#Entity(name = "item")
public class Item {
#Id
#GeneratedValue
private UUID itemId;
#NotNull
#Column(name = "title")
#Size(max = 255)
private String title;
#NotNull
#Column(name = "description")
private String description;
#NotNull
#Column(name = "created_user_id")
private UUID createdUserId;
}
The result from your native query cannot simply be mapped to entities due to the in-database aggregation performed to calculate the MIN of own bids, and the MIN of other bids. In particular, your Item entity doesn't carry any attributes to hold myBid or lowestbid.
What you want to return from the query method is therefore a Projection. A projection is a mere interface with getter methods matching exactly the fields returned by your query:
public interface BidSummary {
UUID getItem_id();
String getTitle();
String getDescription();
double getMyBid();
double getLowestbid();
}
Notice how the query method returns the BidSummary projection:
#Query(value = "SELECT item.*, MIN(myBid.bid) AS myBid, (SELECT MIN(lowestBid.bid) AS lowestbid FROM bids lowestBid WHERE lowestBid.item_id = item.item_id GROUP BY lowestBid.item_id) FROM item JOIN bids myBid ON item.item_id = myBid.item_id WHERE myBid.user_id = :user_id GROUP BY item.item_id", nativeQuery = true)
public List<BidSummary> findOwnBids(#Param("user_id") UUID userId);
Return type is List of Item objects and the query specified is having columns which are not part of return object. I recommend using appropriate Entity which full-fills your response type.

How to avoid N+1 queries for primary entities, when using Custom DTO Object

Have a simple DTO Object as follows,
#BatchSize(size=100)
public class ProductDto {
#BatchSize(size=100)
private Product product;
private UUID storeId;
private String storeName;
public ProductDto(Product product, UUID storeId, String storeName) {
super();
this.product = product;
this.storeId = storeId;
this.storeName = storeName;
}
// setters and getters ...
}
Have Spring Data Repository as follows,
public interface ProductRepository extends CrudRepository<Product, java.util.UUID>
{
#BatchSize(size=100)
#Query("SELECT new com.app1.dto.ProductDto(c, r1.storeName, r1.id) "
+ "FROM Product c left outer join c.storeRef r1")
public Page<ProductDto> findProductList_withDTO(Pageable pageable);
}
When the code runs, it executes one SQL for loading Product ID and StoreName and StoreId
SELECT product0_.id AS col_0_0_,store1_.store_name AS col_1_0_,
store1_.id AS col_2_0_ FROM Product product0_
LEFT OUTER JOIN Store store1_ LIMIT 10
But the problem is next line,
For each Product Row that exists in Database, it's executing SQL Query, instead of using IN to batch load all matching Products in 1 SQL.
For each products it executes following,
SELECT product0_.id AS id1_20_0_, product0_.prd_name AS prd_nam15_20_0_ FROM Product product0_ WHERE product0_.id = ?
SELECT product0_.id AS id1_20_0_, product0_.prd_name AS prd_nam15_20_0_ FROM Product product0_ WHERE product0_.id = ?
Question,
How can we inform hibernate to load all Product Rows in Single SQL Query? I have tried adding #BatchSize(size=100) at all places. Still it's executing multiple queries to load Product Data.
Any hints/solution will be appreciated.
Thanks
Mark

Resources