PageRequest and OrderBy method name Issue - spring

in our Spring application we have a table that contains a lot of "Payment" record. Now we need a query that pages the results sorted from the one with the largest total to the smallest, we are facing an error because sometimes the same record is contained in two successive pages.
We are creating a PageRequest passed to the repository. Here our implementation:
Repository:
public interface StagingPaymentEntityRepository extends JpaRepository<StagingPaymentEntity, Long> {
Page<StagingPaymentEntity> findAllByStatusAndCreatedDateLessThanEqualAndOperationTypeOrderByEffectivePaymentDesc(String status, Timestamp batchStartTimestamp, String operationType, Pageable pageable);
}
public class BatchThreadReiteroStorni extends ThreadAbstract<StagingPaymentEntity> {
PageRequest pageRequest = PageRequest.of (index, 170);
Page<StagingPaymentEntity> records = ((StagingPaymentEntityRepository) repository).findAllByStatusAndCreatedDateLessThanEqualAndOperationTypeOrderByEffectivePaymentDesc("REITERO", batchStartTimestamp, "STORNO", pageRequest) ;
}
where index is the index of the page we are requesting.
There is a way to understand why it is happening ? Thank for support

This can have multiple reasons.
Non deterministic ordering: If the ordering you are using isn't deterministic, i.e. there are rows that might com in any order that order might change between selects resulting in items getting skipped or returned multiple times. Fix: add the primary key as a last column to the ordering.
If you change the entities in a way that affects the ordering, or another process does that you might end up with items getting processed multiple times.
In this scenario I see a couple of approaches:
do value based pagination. I.e. don't select pages but select the next N rows after .
Instead of paging use a Stream this allows to use a single select but still processing the results an element at a time. You might have to flush and evict entities and I'm not 100% sure that works, but certainly worth a try.
Finally you can mark all all rows that you want to process in a separate column, then select N marked entities and unmark them once they are processed.

Related

Adding a custom sorting to listing with an aggregate in shopware 6

I am trying to build a custom sorting for the product listings in shopware 6.
I want to include a foreign table (entity is: leasingPlanEntity), get the min of one of the fields of that table (period_price) and then order the search result by that value.
I have already built a Subscriber, and try it like that, what seems to work.
public static function getSubscribedEvents(): array
{
return [
//ProductListingCollectFilterEvent::class => 'addFilter'
ProductListingCriteriaEvent::class => ['addCriteria', 5000]
];
}
public function addCriteria(ProductListingCriteriaEvent $event): void
{
$criteria = $event->getCriteria();
$criteria->addAssociation('leasingPlan');
$criteria->addAggregation(new MinAggregation('min_period_price', 'leasingPlan.periodPrice'));
// Sortierung hinzufügen.
$availableSortings = $event->getCriteria()->getExtension('sortings') ?? new ProductSortingCollection();
$myCustomSorting = new ProductSortingEntity();
$myCustomSorting->setId(Uuid::randomHex());
$myCustomSorting->setActive(true);
$myCustomSorting->setTranslated(['label' => 'My Custom Sorting at runtime']);
$myCustomSorting->setKey('my-custom-runtime-sort');
$myCustomSorting->setPriority(5);
$myCustomSorting->setFields([
[
'field' => 'leasingPlan.periodPrice',
'order' => 'asc',
'priority' => 1,
'naturalSorting' => 0,
],
]);
$availableSortings->add($myCustomSorting);
$event->getCriteria()->addExtension('sortings', $availableSortings);
}
Is this already the right way to get the min(periodPrice)? Or is it taking just a random value out of the leasingPlan table to define the sort-order?
I didn't find a way, to define the min_period_price aggregate value in the $myCustomSorting->setFields Methods.
Update 1
Some days later, I asked a less complex question in the shopware community on slack:
Is it possible to use the DAL to define a subquery for an association in the product-listing?
It should generate something like:
FROM
JOIN (
SELECT ... FROM ... WHERE ... GROUP BY ... ORDER BY ...
) AS ...
The answer there was:
Don't think so
Update 2
I also did an in-deep anlysis of the DAL-Query-Builder, and it really seems to be not possible, to perform a subquery with the current version.
Update 3 - Different approach
A different approach might be, to define custom fields in the main entity. Every time a change is made on the main entity, the values of this custom fields should be recalculated.
It is a lot of overhead work, to realize this. Especially when the fields you are adding, are dependend on other data like the availability of a product in the store, for example.
So check, if it is worth the extra work. Would be better, to have a solution for building subqueries.
Unfortunately it seems that in your case there is no easy way to achieve this, if I understand the issue correctly.
Consider the following: for each product you can have multiple leasingPlan entities, and I assume that for a given context (like a specific sales channel or listing) that still holds. This means that you would have to sort the leasingPlan entities by price, then take the one with the lowest price, and then sort the products by their lowest-price leasingPlan's price.
There seems to be no other way to achieve that, and unfortunately for you, sorting is applied at the end, even if it is sort of a subquery.
So, for example, if you have the following snippet
$criteria = $event->getCriteria();
$criteria->addAssociation('leasingPlan');
$criteria->getAssociation('leasingPlan')
->addSorting(new FieldSorting('price', FieldSorting::ASCENDING))
->setLimit(1)
;
The actual price-sorting would be applied AFTER the leasingPlan entities are fetched - essentially the results would be sorted, meaning that you would not get the cheapest leasing plan per product, instead getting the first one.
You can only do something like that with filters, but in this case there is nothing to filter by - I assume you don't have one leasingPlan per SalesChannel or per language, so that you could limit that list to just one entry that could be used for sorting
That is not to mention that this could not be included in a ProductSortingEntity, but you could always work around that by plugging into the appropriate events and modifying the criteria during runtime
I see two ways to resolve your issue
Making another table which would store the cheapest leasingPlan per product and just using that as your association
Storing the information about the cheapest leasingPlans in e.g. cache and using that for filtering (caution: a mistake here would probably break the sorting, for example if you end up with too few or too many leasingPlans per product)
public function applyCustomSorting(ProductListingCriteriaEvent $event): void
{
// One leasingPlan per one product
$cheapestLeasingPlans = $this->myCustomService->getCheapestLeasingPlanIds();
$criteria = $event->getCriteria();
$criteria->addAssociation('leasingPlan');
$criteria->getAssociation('leasingPlan')
->addSorting(new FieldSorting('price', FieldSorting::ASCENDING))
->addFilter(new EqualsAnyFilter('id', $cheapestLeasingPlans))
;
}
And then you could sort by
$criteria->addSorting(new FieldSorting('leasingPlan.periodPrice', FieldSorting::ASCENDING));
There should be no need to add the association manually and to add the aggregation to the criteria, that should happen automatically behind the scenes if your custom sorting is selected in the storefront.
For more information refer to the official docs.

Spring data - Order by multiplication of columns

I came to a problem where I need to put ordering by multiplication of two columns of entity, for the sake of imagination entity is:
#Entity
public class Entity {
#Column(name="amount")
private BigDecimal amount;
#Column(name="unitPprice")
private BigDecimal unitPrice;
.
.
.
many more columns
}
My repo interface implements JpaRepository and QuerydslPredicateExecutor,
but I am struggling to find a way to order my data by "amount*unitPrice",
as I can't find a way to put it into
PageRequest (new Sort.Order(ASC, "amount * unitPrice"))
without having PropertyReferenceException: No property amount * unitPrice... thrown.
I can't user named query, as my query takes quite massive filter based on user inputs (can't put where clause into query, because if user hasn't selected any value, where clause can't just be in query).
To make it simple. I need something like findAll(Predicate, Pageable), but I need to force that query to order itself by "amount * unitPrice", but also have my Preditate (filter) and Pageable (offset, limit, other sortings) untouched.
Spring Sort can be used only for sorting by properties, not by expressions.
But you can create a unique sort in a Predicate, so you can add this sort-predicate to your other one before you call the findAll method.

How to get a SOQL query out of a for loop

Code added.
I have searched endlessly for a solution here and cannot find one, please help!
I have three objects (A, B, and C). A has a lookup to B, and B is the master to C (detail). Both A and C have many records related to each B record.
I want to have a job run that gets a subset of records from object C (it will usually be around 5,000 records). Then go through each of those and get the records on Object A that lookup to the same Object B record, summarize an Object A number field, and put that on the C record.
I have successfully gotten this to work in small scale, <100 Object C records. But each Object C record requires a new SOQL query since I am iterating through them in a for loop after I get all the Object C records. Plus I know this it is not best practice to ever have a query in a loop.
How can I get this to work? Since the records share the relationship with Object B, is there another way to get the data from the Object A records that match? Or is there some way to pull two lists, one Object C and one Object A. Then summarize the Object A records and line the lists up some how?
Thanks in advance!
Code:
public class nightlyJob {
public static void updateNumbers(){
integer I = 29;
List<ObjectC__c> CUpdateList = new List<ObjectC__c>();
List<ObjectC__c> CpullList =
[SELECT ID, Index__c, ObjectB__r.id
FROM ObjectC__c
WHERE Index__c = :I];
for(ObjectC__c s : CpullList){
List<ObjectA__c> AList =
[SELECT ObjectB__c, Number__c
FROM ObjectA__c
WHERE ObjectB__c = :s.ObjectB__r.Id];
decimal NumSum = 0;
for(ObjectA__c a : AList){
NumSum = a.Number__c + NumSum;
}
s.Num__c = NumSum;
CUpdateList.add(s);
}
update CUpdateList;
}
}
It looks like you are really missing several fundamental concepts at the moment.
The biggest problem you are up against in SFDC development is that "database" operations are very expensive and are strictly limited. It's not just a matter of "best practice": if in a single transaction you exceed these limits -- number of SOQL calls, number of records returned, number of records updated, number of DML statements, etc. -- your transaction will fail. For details, search online for "Salesforce Execution Governors and Limits".
You can write code that works within these limitations, but there is a bit of a learning curve.
First, learn to use collections with SOQL queries to get your SOQL queries out of loops. This is a.k.a. "bulkfication" and it fundamental to SFDC development:
List<ObjectC__c> CpullList =
[SELECT ID, Index__c, ObjectB__r.id
FROM ObjectC__c
WHERE Index__c = :I];
// Create a map with the results of this query.
// key=ObjectC__c.Id, value = Object__c record
Map<Id, ObjectC__c> objCmap = Map<Id, ObjectC__c>(CpullList);
// Build a set of all the Object_B id's from this result set
Set<Id> objBids = new Set<Id>();
for (ObjectC__c record : CpullList) {
objBids.add(record.ObjectB__r.id);
}
// Now you can use only one SOQL query instead of a loop
List<ObjectA> AList = [SELECT ObjectB__c, Number__c
FROM ObjectA__c
WHERE ObjectB__c in:objBids];
Next, use "SOQL aggregate functions" whenever you can. Example: in your code here, you could use "SUM()" and "group by" instead of performing these calculations with loops:
// Get the sum of ObjectA__c.Number__c for each Object B in objBIds
AggregateResult[] groupedResults = [select ObjectB__c,
sum(Number__c) sumA
from ObjectA__c
where ObjectB__c in: objBids
group by ObjectB__c];
for (AggregateResult ar : groupedResults) {
System.debug('Object B Id' + ar.get('Objectb__c'));
System.debug('Sum of ObjectA__c.Number__c' + ar.get('sumA'));
// Here, you might want to build a Map<Id, Integer> sumAmap:
// key=Object B ID, value=sumA
// and then use it along with objCmap to build a collection of Object C's
// for your update statement...
}
You can continue this process and apply these ideas to make the code more efficient.
But even after you have your methods working as efficiently as possible, you still may run into limits due to the number of records you're dealing with. At that point, you will need to learn about the Batchable interface, the Queuable interface and #future calls (how to process a larger number of records, split across transactions) That's really too much to information to cover in a single SO answer.

SingleColumnValueFilter not returning proper number of rows

In our HBase table, each row has a column called crawl identifier. Using a MapReduce job, we only want to process at any one time rows from a given crawl. In order to run the job more efficiently we gave our scan object a filter that (we hoped) would remove all rows except those with the given crawl identifier. However, we quickly discovered that our jobs were not processing the correct number of rows.
I wrote a test mapper to simply count the number of rows with the correct crawl identifier, without any filters. It iterated over all the rows in the table and counted the correct, expected number of rows (~15000). When we took that same job, added a filter to the scan object, the count dropped to ~3000. There was no manipulation of the table itself during or in between these two jobs.
Since adding the scan filter caused the visible rows to change so dramatically, we expect that we simply built the filter incorrectly.
Our MapReduce job features a single mapper:
public static class RowCountMapper extends TableMapper<ImmutableBytesWritable, Put>{
public String crawlIdentifier;
// counters
private static enum CountRows {
ROWS_WITH_MATCHED_CRAWL_IDENTIFIER
}
#Override
public void setup(Context context){
Configuration configuration=context.getConfiguration();
crawlIdentifier=configuration.get(ConfigPropertyLib.CRAWL_IDENTIFIER_PROPERTY);
}
#Override
public void map(ImmutableBytesWritable legacykey, Result row, Context context){
String rowIdentifier=HBaseSchema.getValueFromRow(row, HBaseSchema.CRAWL_IDENTIFIER_COLUMN);
if (StringUtils.equals(crawlIdentifier, rowIdentifier)){
context.getCounter(CountRows.ROWS_WITH_MATCHED_CRAWL_IDENTIFIER).increment(1l);
}
}
}
The filter setup is like this:
String crawlIdentifier=configuration.get(ConfigPropertyLib.CRAWL_IDENTIFIER_PROPERTY);
if (StringUtils.isBlank(crawlIdentifier)){
throw new IllegalArgumentException("Crawl Identifier not set.");
}
// build an HBase scanner
Scan scan=new Scan();
SingleColumnValueFilter filter=new SingleColumnValueFilter(HBaseSchema.CRAWL_IDENTIFIER_COLUMN.getFamily(),
HBaseSchema.CRAWL_IDENTIFIER_COLUMN.getQualifier(),
CompareOp.EQUAL,
Bytes.toBytes(crawlIdentifier));
filter.setFilterIfMissing(true);
scan.setFilter(filter);
Are we using the wrong filter, or have we configured it wrong?
EDIT: we're looking at manually adding all the column families as per https://issues.apache.org/jira/browse/HBASE-2198 but I'm pretty sure the Scan includes all the families by default.
The filter looks correct, but under certain conditions one scenario that could cause this relates to character encodings. Your Filter is using Bytes.toBytes(String) which uses UTF8 [1], whereas you might be using native character encoding in HBaseSchema or when you write the record if you use String.getBytes()[2]. Check that the crawlIdentifier was originally written to HBase using the following to ensure the filter is comparing like for like in the filtered scan.
Bytes.toBytes(crawlIdentifier)
[1] http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/util/Bytes.html#toBytes(java.lang.String)
[2] http://docs.oracle.com/javase/1.4.2/docs/api/java/lang/String.html#getBytes()

Using LINQ Expression Instead of NHIbernate.Criterion

If I were to select some rows based on certain criteria I can use ICriterion object in NHibernate.Criterion, such as this:
public List<T> GetByCriteria()
{
SimpleExpression newJobCriterion =
NHibernate.Criterion.Expression.Eq("LkpStatu", statusObject);
ICriteria criteria = Session.GetISession().CreateCriteria(typeof(T)).SetMaxResults(maxResults);
criteria.Add(newJobCriterion );
return criteria.List<T>();
}
Or I can use LINQ's where clause to filter what I want:
public List<T> GetByCriteria_LINQ()
{
ICriteria criteria = Session.GetISession().CreateCriteria(typeof(T)).SetMaxResults(maxResults);
return criteria.Where(item=>item.LkpStatu=statusObject).ToList();
}
I would prefer the second one, of course. Because
It gives me strong typing
I don't need to learn yet-another-syntax in the form of NHibernate
The issue is is there any performance advantage of the first one over the second one? From what I know, the first one will create SQL queries, so it will filter the data before pass into the memory. Is this kind of performance saving big enough to justify its use?
As usual it depends. First note that in your second snippet there is .List() missing right after return criteria And also note that you won't get the same results on both examples. The first one does where and then return top maxResults, the second one however first selects top maxResults and then does where.
If your expected result set is relatively small and you are likely to use some of the results in lazy loads then it's actually better to take the second approach. Because all entities loaded through a session will stay in its first level cache.
Usually however you don't do it this way and use the first approach.
Perhaps you wanted to use NHibernate.Linq (located in Contrib project ). Which does linq translation to Criteria for you.
I combine the two and made this:
var crit = _session.CreateCriteria(typeof (T)).SetMaxResults(100);
return (from x in _session.Linq<T>(crit) where x.field == <something> select x).ToList();

Resources