Ruby ActiveRecord retrieving new inserted records in batches?

Ruby ActiveRecord retrieving new inserted records in batches? - ruby

This pertains to my previous query.
I am able to get batches of records from ActiveRecord using the following query:
Client.offset(15 * iteration).first(15)
I'm running into an issue whereby a user inputs a new record.
Say there are 100 records, and batches are of 15.
The first 6 iterations (90 records) and the last iteration (10 records) works. However, if a new entry comes in (making it a total of 101 records), the program fails as it will either:
a) If the iteration counter is set to increment, it will query for records beyond the range of 101, resulting in nothing being returned.
b) If the iteration counter is modified to increment only when an entire batch of 15 items are complete, then repeat the last 14 items plus 1 new item.
How do i go about getting newly posted records?
dart code:
_getData() async {
if (!isPerformingRequest) {
setState(() {
isPerformingRequest = true;
});
//_var represents the iteration counter - 0, 1, 2 ...
//fh is a helper method returning records in List<String>
List<String> newEntries = await fh.feed(_var);
//Count number of returned records.
newEntries.forEach((i) => _count++);
if (newEntries.isEmpty) {
..
}
} else {
//if number of records is 10, increment _var to go to the next batch of 10.
if(_count == 10) {
_var++;
_count = 0;
}
//but if count is not 10, stay with the same batch (but previously counted/display records will be shown again)
}
setState(() {
items.addAll(newEntries);
isPerformingRequest = false;
});
}
}

Related

How can I speed up the performance Spring Jpa?

The code
int offset = 0;
int bulkSize = eodFileConfig.getBulkSize(); // sample of 10k
setDateTimeFormat();
//Get total record from Temp table
long max = extQRMerchantTrxHistService.getTotalRecords();
do {
log.debug("[execute] start to write to actual table");
// bulk size represent how many items in a page, offset is the page
Page<ExtensionQRMerchantTrxHistEntity> records = extQRMerchantTrxHistService.findRecordsWithPagination(offset, bulkSize);
List<TransactionHistoryExtEntity> transactions = new ArrayList<>();
for (ExtensionQRMerchantTrxHistEntity tempEntity : records) {
log.debug("Record: {} ", tempEntity);
Date a = new Date();
//Query from T_TRXN_DETAIL_EXT
Date dateTime = sf.parse(tempEntity.getTransactionDate());
List<TransactionHistoryExtEntity> histories =
transactionHistoryInquiryService.retrieveHistoryBasedOnRefNoDateAmt(
tempEntity.getTransactionRefNo(), dateTime, tempEntity.getTransactionAmount());
Date b = new Date();
System.out.println("Query Time: " + Math.abs(a.getTime() - b.getTime()));
Date c = new Date();
TransactionHistoryExtEntity transaction;
if (histories.isEmpty()) {
//Insert record
transaction = setTransactionHistory(Boolean.TRUE, tempEntity, null);
} else {
//Update record
transaction = setTransactionHistory(Boolean.FALSE, tempEntity, histories.get(0));
}
Date d = new Date();
System.out.println("Query Time: " + Math.abs(c.getTime() - d.getTime()));
Date e = new Date();
//transactionHistoryExtRepository.saveAndFlush(transaction);
Date f = new Date();
System.out.println("Query Time: " + Math.abs(e.getTime() - f.getTime()));
//Add to list
transactions.add(transaction);
}
//Save & Update all records
transactionHistoryExtRepository.saveAll(transactions);
offset++;
} while ((long) offset * bulkSize < max);
The query
List<TransactionHistoryExtEntity> findTopByReferenceNumberAndTransactionDateAndAmountOrderByTransactionDateDesc(
String referenceNo, Date transactionDate, BigDecimal amount);
I am a bit new with this spring boot stuff. I am trying to insert/update 50k-100k of records into the table. Running 10k seems fast enough however as the size increases the time it takes for the query part in the inner for loop increases. For a record of 50k with 10k bulksize, the first 10k took about 80 seconds to complete (the entire inner iteration), followed by 472 seconds for the next 10k while the last 10k took 1k+ seconds to process. Can anyone explain what is causing the issue? Also, I cleared my table before executing this, meaning it is empty when I run this.
As a result, I tried different bulksize such as 30k and 50k. At 30k bulksize, which means it will be 2 outer loops, it completes everything at about 38mins, 50k bulksize at about 33 mins whereas 10k at an hour for 50k records. But this would defeat the purpose of having pagination at the start.
So I noticed that the avg time to query spiked after saveAll method. I am not too sure if it is due to the saveAll or the size of the records. Without the inner loop query part, the entire process just takes about 1 mins to complete.
Does anyone have any idea regarding the issue and how I can increase the performance?

Why is there consistent variation in execution time on my timed trigger?

I have a timed trigger that runs every 15 minutes. A simplified partial version of the script is shown below. The script compiles data from about 50 other spreadsheets and records a row for each spreadsheet, then writes that summary data to the active spreadsheet.
I noticed that in the logs, there is an alternating pattern in the execution times for this script: half the executions take 200-400 seconds, and the other half typically take 700-900 seconds. It's a pretty significant difference, and the pattern persists over the past several days of logs.
There's nothing in the script itself that changes from one execution to the next, so I'm curious if anyone can suggest a reason this would happen (even better if it's a documented reason). For example, is there some sort of caching of the spreadsheet reads so that the next execution gets those values faster?
// The triggered function.
function updateRankings()
{
var rankingSheet = SS.getSheetByName(RANKING_SHEET_NAME) // SS is the active spreadsheet
// Read the id's of the target spreadsheets, which are stored on an external spreadsheet
var gyms = getRowsData( SpreadsheetApp.openById(ADMIN_PANEL_ID).getSheetByName(ADMIN_PANEL_SHEET_NAME))
// Iterate over gyms
gyms.forEach(getGymStats)
// Write the compiled data back to the active sheet
setRowsData(rankingSheet, gyms)
}
function getGymStats(gym)
{
var gymSpreadsheet = SpreadsheetApp.openById(gym.spreadsheetId)
// Force spreadsheet formulas to calculate before reading values
SpreadsheetApp.flush()
var metricsSheet = gymSpreadsheet.getSheetByName('Detailed Metrics')
var statsColumn = metricsSheet.getRange('E:E').getValues()
var roasColumn = metricsSheet.getRange('J:J').getValues()
// Get stats
var gymStats = {
facebookAdSpend: getFacebookAdSpend(gymSpreadsheet),
scheduling: statsColumn[8][0],
showup: statsColumn[9][0],
closing: statsColumn[10][0],
costPerLead: statsColumn[25][0],
costPerAppointment: statsColumn[26][0],
costPerShow: statsColumn[27][0],
costPerAcquisition: statsColumn[28][0],
leadCount: statsColumn[13][0],
frontEndRoas: (roasColumn[21][0] / statsColumn[5][0]) || 0,
totalRoas: (roasColumn[35][0] / statsColumn[5][0]) || 0,
totalProjectedRoas: (roasColumn[36][0] / statsColumn[5][0]) || 0,
conversionRate: (gym.currency ?
'=IFS(ISBLANK(INDIRECT("R[0]C[-4]", FALSE)),,ISBLANK(INDIRECT("R[0]C[-2]", FALSE)), 1,TRUE, IFERROR(GOOGLEFINANCE("Currency:"&INDIRECT("R[0]C[-2]", FALSE)&"USD")))' :
1)
}
Object.assign(gym, gymStats)
}
function getFacebookAdSpend(spreadsheet)
{
var range = spreadsheet.getRangeByName('FacebookAdSpend')
if (!range) return ''
return range.getValue()
}

Total row count for Oracle NoSQL node driver

I'm running a very simple query in Oracle NoSQL (NodeJs driver):
let count_statement = `SELECT count(*) FROM ${resource}`;
let res = await this.client.query(count_statement, {});
This returns 0 rows (and thus no count). If I run the query without the count, I get back rows I can iterate over. Is there no way to get the total results for a query.
I don't want to count the results WITHIN the row. I need the number of rows that match this query (which is all of them in this query)

In order to get the full result set, the code should call client.query in a loop intil the continuationKey in the result object is null, as in this example from the nosql node sdk:
/*
* Execute a query and print the results. Optional limit parameter is used to
* limit the results of each query API invocation to that many rows.
*/
async function runQuery(client, stmt, limit) {
const opt = { limit };
let res;
console.log('Query results:');
do {
// Issue the query
res = await client.query(stmt, opt);
// Each call to NoSQLClient.query returns a portion of the
// result set. Iterate over the result set, using the
// QueryResult.continuationKey in the query's option parameter,
// until the result set is exhausted and continuation key is
// null.
for(let row of res.rows) {
console.log(' %O', row);
}
opt.continuationKey = res.continuationKey;
} while(res.continuationKey != null);
}

You can also use the new queryIterable syntax introduced in version 5.3.0
const count_statement = `SELECT count(*) FROM ${resource}`
const rows = []
for await(const res of client.queryIterable(count_statement )) {
rows.push.apply(rows, res.rows);
}
return rows

Linq - How to prevent locks when bulk delete

In my code logic, firstly i am deleting large records with multiple queries & then doing bulk insert.
Here is the Code :-
using (var scope = new TransactionScope())
{
using (var ctx = new ApplicationDbContext(schemaName))
{
// Delete
foreach (var item in queries)
{
// Delete queries - more than 30 - optimized already
ctx.Database.ExecuteSqlCommand(item);
}
// Bulk Insert
BulkInsert(ConnectionString, "Entry", "Entry", bulkTOEntry);
BulkInsert(ConnectionString, "WrongEntry", "WrongEntry", bulkWrongEntry);
}
scope.Complete();
}
The problem here is in the delete part. The delete queries are taking around 10 minutes. This results in the locking of the records, so this is holding the other users from fetching or manipulating records.
I have my code in the TransactionScope as if there is any error while deleting then it will roll back the whole transaction.
I have tried to delete the records in chunks through stored procedures, but that didn't helped here as there is still lock on the records due to the TransactionScope.
How to prevent locks on the records?
Sample of Delete Queries :-
DELETE FROM [Entry]
WHERE CompanyId = 1
AND EmployeeId IN (3, 4, 6, 7, 14, 17, 20, 21, 22,....100 more)
AND Entry_Date = '2016-12-01'
AND Entry_Method = 'I'

if you need to delete the employees in chunk you can split the list of employee with this
public static List<IEnumerable<T>> Partition<T>(this IEnumerable<T> source, int length)
{
var count = source.Count();
var numberOfPartitions = count / length + ( count % length > 0 ? 1 : 0);
List<IEnumerable<T>> result= new List<IEnumerable<T>>();
for (int i = 0; i < numberOfPartitions; i++)
{
result.Add(source.Skip(length*i).Take(length));
}
return result;
}
you can use this method to split the list to small chunks and delete them one chunk at a time so other user can use the table between chunks

Max sequence from a view containing multiple record using Linq lambda

I've been at this for a while. I have a data set that has a reoccurring key and a sequence similar to this:
id status sequence
1 open 1
1 processing 2
2 open 1
2 processing 2
2 closed 3
a new row is added for each 'action' that happens, so the various ids can have variable sequences. I need to get the Max sequence number for each id, but I still need to return the complete record.
I want to end up with sequence 2 for id 1, and sequence 3 for id 2.
I can't seem to get this to work without selecting the distinct ids, then looping through the results, ordering the values and then adding the first item to another list, but that's so slow.
var ids = this.ObjectContext.TNTP_FILE_MONITORING.Select(i => i.FILE_EVENT_ID).Distinct();
List<TNTP_FILE_MONITORING> vals = new List<TNTP_FILE_MONITORING>();
foreach (var item in items)
{
vals.Add(this.ObjectContext.TNTP_FILE_MONITORING.Where(mfe => ids.Contains(mfe.FILE_EVENT_ID)).OrderByDescending(mfe => mfe.FILE_EVENT_SEQ).First<TNTP_FILE_MONITORING>());
}
There must be a better way!

Here's what worked for me:
var ts = new[] { new T(1,1), new T(1,2), new T(2,1), new T(2,2), new T(2,3) };
var q =
from t in ts
group t by t.ID into g
let max = g.Max(x => x.Seq)
select g.FirstOrDefault(t1 => t1.Seq == max);
(Just need to apply that to your datatable, but the query stays about the same)
Note that with your current method, because you are iterating over all records, you also get all records from the datastore. By using a query like this, you allow for translation into a query against the datastore, which is not only faster, but also only returns only the results you need (assuming you are using Entity Framework or Linq2SQL).

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Ruby ActiveRecord retrieving new inserted records in batches? - ruby

Related

How can I speed up the performance Spring Jpa?

Why is there consistent variation in execution time on my timed trigger?

Total row count for Oracle NoSQL node driver

Linq - How to prevent locks when bulk delete

Max sequence from a view containing multiple record using Linq lambda

Categories

Resources