The code
int offset = 0;
int bulkSize = eodFileConfig.getBulkSize(); // sample of 10k
setDateTimeFormat();
//Get total record from Temp table
long max = extQRMerchantTrxHistService.getTotalRecords();
do {
log.debug("[execute] start to write to actual table");
// bulk size represent how many items in a page, offset is the page
Page<ExtensionQRMerchantTrxHistEntity> records = extQRMerchantTrxHistService.findRecordsWithPagination(offset, bulkSize);
List<TransactionHistoryExtEntity> transactions = new ArrayList<>();
for (ExtensionQRMerchantTrxHistEntity tempEntity : records) {
log.debug("Record: {} ", tempEntity);
Date a = new Date();
//Query from T_TRXN_DETAIL_EXT
Date dateTime = sf.parse(tempEntity.getTransactionDate());
List<TransactionHistoryExtEntity> histories =
transactionHistoryInquiryService.retrieveHistoryBasedOnRefNoDateAmt(
tempEntity.getTransactionRefNo(), dateTime, tempEntity.getTransactionAmount());
Date b = new Date();
System.out.println("Query Time: " + Math.abs(a.getTime() - b.getTime()));
Date c = new Date();
TransactionHistoryExtEntity transaction;
if (histories.isEmpty()) {
//Insert record
transaction = setTransactionHistory(Boolean.TRUE, tempEntity, null);
} else {
//Update record
transaction = setTransactionHistory(Boolean.FALSE, tempEntity, histories.get(0));
}
Date d = new Date();
System.out.println("Query Time: " + Math.abs(c.getTime() - d.getTime()));
Date e = new Date();
//transactionHistoryExtRepository.saveAndFlush(transaction);
Date f = new Date();
System.out.println("Query Time: " + Math.abs(e.getTime() - f.getTime()));
//Add to list
transactions.add(transaction);
}
//Save & Update all records
transactionHistoryExtRepository.saveAll(transactions);
offset++;
} while ((long) offset * bulkSize < max);
The query
List<TransactionHistoryExtEntity> findTopByReferenceNumberAndTransactionDateAndAmountOrderByTransactionDateDesc(
String referenceNo, Date transactionDate, BigDecimal amount);
I am a bit new with this spring boot stuff. I am trying to insert/update 50k-100k of records into the table. Running 10k seems fast enough however as the size increases the time it takes for the query part in the inner for loop increases. For a record of 50k with 10k bulksize, the first 10k took about 80 seconds to complete (the entire inner iteration), followed by 472 seconds for the next 10k while the last 10k took 1k+ seconds to process. Can anyone explain what is causing the issue? Also, I cleared my table before executing this, meaning it is empty when I run this.
As a result, I tried different bulksize such as 30k and 50k. At 30k bulksize, which means it will be 2 outer loops, it completes everything at about 38mins, 50k bulksize at about 33 mins whereas 10k at an hour for 50k records. But this would defeat the purpose of having pagination at the start.
So I noticed that the avg time to query spiked after saveAll method. I am not too sure if it is due to the saveAll or the size of the records. Without the inner loop query part, the entire process just takes about 1 mins to complete.
Does anyone have any idea regarding the issue and how I can increase the performance?
Related
I have a table called transactions with two relevant fields to my question, _start_timestamp_ and _end_timestamp_. I need to sum the amount of time passed between all transactions where _end_timestamp_ is not null. So, the result must be something like Total Time of Transactions: 1 hour and 18 minutes
I've tried using Carbon, but I don't know how to sum all the lines of the table using it.
foreach($timestampStarts as $timestampStart){
$time = new Carbon($timestampStart->start_timestamp);
$shift_end_time =new Carbon($timestampStart->end_timestamp);
dd($time->diffForHumans($shift_end_time));
}
You can use the MySQL TIMESTAMPDIFF function to calculate the difference:
Transaction::whereNotNull('_end_timestamp_')
->sum(DB::raw('TIMESTAMPDIFF(SECOND, _start_timestamp_, _end_timestamp_)'));
This will give you the total in seconds.
You need to retrieve the total difference in minutes. Add the total difference and retrieve the diff for humans. You can retrieve like below:
$diffForHumans = null;
$nowDate = Carbon::now();
$diffInMinutes = 0;
foreach($timestampStarts as $timestampStart){
if(!empty($timestampStart->end_timestamp)){ // if the end timestamp is not empty
$time = new Carbon($timestampStart->start_timestamp);
$shift_end_time =new Carbon($timestampStart->end_timestamp);
//Adding difference in minutes
$diffInMinutes+= $shift_end_time->diffInMinutes($time);
}
}
$totalDiffForHumans = $now->addMinutes($diffInMinutes)->diffForHumans();
This pertains to my previous query.
I am able to get batches of records from ActiveRecord using the following query:
Client.offset(15 * iteration).first(15)
I'm running into an issue whereby a user inputs a new record.
Say there are 100 records, and batches are of 15.
The first 6 iterations (90 records) and the last iteration (10 records) works. However, if a new entry comes in (making it a total of 101 records), the program fails as it will either:
a) If the iteration counter is set to increment, it will query for records beyond the range of 101, resulting in nothing being returned.
b) If the iteration counter is modified to increment only when an entire batch of 15 items are complete, then repeat the last 14 items plus 1 new item.
How do i go about getting newly posted records?
dart code:
_getData() async {
if (!isPerformingRequest) {
setState(() {
isPerformingRequest = true;
});
//_var represents the iteration counter - 0, 1, 2 ...
//fh is a helper method returning records in List<String>
List<String> newEntries = await fh.feed(_var);
//Count number of returned records.
newEntries.forEach((i) => _count++);
if (newEntries.isEmpty) {
..
}
} else {
//if number of records is 10, increment _var to go to the next batch of 10.
if(_count == 10) {
_var++;
_count = 0;
}
//but if count is not 10, stay with the same batch (but previously counted/display records will be shown again)
}
setState(() {
items.addAll(newEntries);
isPerformingRequest = false;
});
}
}
I am working on a Spring-MVC application using Hibernate as ORM and PostgreSQL database, in which I am saving in a
database row some values(obvious). With that information, I am also
saving the TimeStamp(java.sql.TimeStamp) when the entry was made. For
some reasons I want to remove entries which are older than 5 minutes.
How is it possible for me to give an HQL query with Timestamp, something like if timestamp>oldTimestamp then delete that row. I have something like this till now :
#Override
#Scheduled(fixedRate = 100000)
public void removeStaleLocks() {
session = this.sessionFactory.getCurrentSession();
//Timestamp timestamp = // current timestamp;
Query query = session.createQuery("delete from NoteLock as nl where nl.timeStamp=:timeStamp");
query.setParameter("timeStamp",timeStamp);
query.executeUpdate();
session.flush();
}
What I would like to do is pass the query a parameter as use this as current timestamp denoting this is the currentTime, and delete all notes which are more than 5 minutes old. Any help would be nice. Thanks a lot.
long now = System.currentTimeMillis();
long nowMinus5Minutes = now - (5L * 60L * 1000L);
Timestamp nowMinus5MinutesAsTimestamp = new Timestamp(nowMinus5Minutes);
Query query = session.createQuery("delete from NoteLock as nl where nl.timeStamp < :limit");
query.setParameter("limit", nowMinus5MinutesAsTimestamp);
query.executeUpdate();
Let's say I have a a script that iterates over a list of 400 objects.
Each object has anywhere from 1 to 10 properties.
Each property is a reasonable size string or a somewhat large integer.
Is there a significant difference in performance of saving these objects into ScriptDB vs saving them into Spreadsheet(w/o doing it in one bulk operation).
Executive Summary
Yes, there is a significant difference! Huge! And I have to admit that this experiment didn't turn out the way I expected.
With this amount of data, writing to a spreadsheet was always much faster than using ScriptDB.
These experiments support the assertions regarding bulk operations in the Google Apps Script Best Practices. Saving data in a spreadsheet using a single setValues() call was 75% faster than line-by-line, and two orders of magnitude faster than cell-by-cell.
On the other hand, recommendations to use Spreadsheet.flush() should be considered carefully, due to the performance impact. In these experiments, a single write of a 4000-cell spreadsheet took less than 50ms, and adding a call to flush() increased that to 610ms - still less than a second, but an order of magnitude tax seems ludicrous. Calling flush() for each of the 400 rows in the sample spreadsheet made the operation take almost 12 seconds, when it took just 164 ms without it. If you've been experiencing Exceeded maximum execution time errors, you may benefit from both optimizing your code AND removing calls to flush().
Experimental Results
All timings were derived following the technique described in How to measure time taken by a function to execute. Times are expressed in milliseconds.
Here are the results from a single pass of five different approaches, two using ScriptDB, three writing to Spreadsheets, all with the same source data. (400 objects with 5 String & 5 Number attributes)
Experiment 1
Elapsed time for ScriptDB/Object test: 53529
Elapsed time for ScriptDB/Batch test: 37700
Elapsed time for Spreadsheet/Object test: 145
Elapsed time for Spreadsheet/Attribute test: 4045
Elapsed time for Spreadsheet/Bulk test: 32
Effect of Spreadsheet.flush()
Experiment 2
In this experiment, the only difference from Experiment 1 was that we called Spreadsheet.flush() after every setValue/s call. The cost of doing so is dramatic, (around 700%) but does not change the recommendation to use a spreadsheet over ScriptDB for speed reasons, because writing to spreadsheets is still faster.
Elapsed time for ScriptDB/Object test: 55282
Elapsed time for ScriptDB/Batch test: 37370
Elapsed time for Spreadsheet/Object test: 11888
Elapsed time for Spreadsheet/Attribute test: 117388
Elapsed time for Spreadsheet/Bulk test: 610
Note: This experiment was often killed with Exceeded maximum execution time.
Caveat Emptor
You're reading this on the interwebs, so it must be true! But take it with a grain of salt.
These are results from very small sample sizes, and may not be completely reproducible.
These results are measuring something that changes constantly - while they were observed on Feb 28 2013, the system they measured could be completely different when you read this.
The efficiency of these operations is affected by many factors that are not controlled in these experiments; caching of instructions & intermediate results and server load, for example.
Maybe, just maybe, someone at Google will read this, and improve the efficiency of ScriptDB!
The Code
If you want to perform (or better yet, improve) these experiments, create a blank spreadsheet, and copy this into a new script within it. This is also available as a gist.
/**
* Run experiments to measure speed of various approaches to saving data in
* Google App Script (GAS).
*/
function testSpeed() {
var numObj = 400;
var numAttr = 10;
var doFlush = false; // Set true to activate calls to SpreadsheetApp.flush()
var arr = buildArray(numObj,numAttr);
var start, stop; // time catchers
var db = ScriptDb.getMyDb();
var sheet;
// Save into ScriptDB, Object at a time
deleteAll(); // Clear ScriptDB
start = new Date().getTime();
for (var i=1; i<=numObj; i++) {
db.save({type: "myObj", data:arr[i]});
}
stop = new Date().getTime();
Logger.log("Elapsed time for ScriptDB/Object test: " + (stop - start));
// Save into ScriptDB, Batch
var items = [];
// Restructure data - this is done outside the timed loop, assuming that
// the data would not be in an array if we were using this approach.
for (var obj=1; obj<=numObj; obj++) {
var thisObj = new Object();
for (var attr=0; attr < numAttr; attr++) {
thisObj[arr[0][attr]] = arr[obj][attr];
}
items.push(thisObj);
}
deleteAll(); // Clear ScriptDB
start = new Date().getTime();
db.saveBatch(items, false);
stop = new Date().getTime();
Logger.log("Elapsed time for ScriptDB/Batch test: " + (stop - start));
// Save into Spreadsheet, Object at a time
sheet = SpreadsheetApp.getActive().getActiveSheet().clear();
start = new Date().getTime();
for (var row=0; row<=numObj; row++) {
var values = [];
values.push(arr[row]);
sheet.getRange(row+1, 1, 1, numAttr).setValues(values);
if (doFlush) SpreadsheetApp.flush();
}
stop = new Date().getTime();
Logger.log("Elapsed time for Spreadsheet/Object test: " + (stop - start));
// Save into Spreadsheet, Attribute at a time
sheet = SpreadsheetApp.getActive().getActiveSheet().clear();
start = new Date().getTime();
for (var row=0; row<=numObj; row++) {
for (var cell=0; cell<numAttr; cell++) {
sheet.getRange(row+1, cell+1, 1, 1).setValue(arr[row][cell]);
if (doFlush) SpreadsheetApp.flush();
}
}
stop = new Date().getTime();
Logger.log("Elapsed time for Spreadsheet/Attribute test: " + (stop - start));
// Save into Spreadsheet, Bulk
sheet = SpreadsheetApp.getActive().getActiveSheet().clear();
start = new Date().getTime();
sheet.getRange(1, 1, numObj+1, numAttr).setValues(arr);
if (doFlush) SpreadsheetApp.flush();
stop = new Date().getTime();
Logger.log("Elapsed time for Spreadsheet/Bulk test: " + (stop - start));
}
/**
* Create a two-dimensional array populated with 'numObj' rows of 'numAttr' cells.
*/
function buildArray(numObj,numAttr) {
numObj = numObj | 400;
numAttr = numAttr | 10;
var array = [];
for (var obj = 0; obj <= numObj; obj++) {
array[obj] = [];
for (var attr = 0; attr < numAttr; attr++) {
var value;
if (obj == 0) {
// Define attribute names / column headers
value = "Attr"+attr;
}
else {
value = ((attr % 2) == 0) ? "This is a reasonable sized string for testing purposes, not too long, not too short." : Number.MAX_VALUE;
}
array[obj].push(value);
}
}
return array
}
function deleteAll() {
var db = ScriptDb.getMyDb();
while (true) {
var result = db.query({}); // get everything, up to limit
if (result.getSize() == 0) {
break;
}
while (result.hasNext()) {
var item = result.next()
db.remove(item);
}
}
}
ScriptDB has been deprecated. Do not use.
I'm trying to delete all calendar entries from today forward. I run a query then call getEntries() on the query result. getEntries() always returns 25 entries (or less if there are fewer than 25 entries on the calendar). Why aren't all the entries returned? I'm expecting about 80 entries.
As a test, I tried running the query, deleting the 25 entries returned, running the query again, deleting again, etc. This works, but there must be a better way.
Below is the Java code that only runs the query once.
CalendarQuery myQuery = new CalendarQuery(feedUrl);
DateFormat dfGoogle = new SimpleDateFormat("yyyy-MM-dd'T00:00:00'");
Date dt = Calendar.getInstance().getTime();
myQuery.setMinimumStartTime(DateTime.parseDateTime(dfGoogle.format(dt)));
// Make the end time far into the future so we delete everything
myQuery.setMaximumStartTime(DateTime.parseDateTime("2099-12-31T23:59:59"));
// Execute the query and get the response
CalendarEventFeed resultFeed = service.query(myQuery, CalendarEventFeed.class);
// !!! This returns 25 (or less if there are fewer than 25 entries on the calendar) !!!
int test = resultFeed.getEntries().size();
// Delete all the entries returned by the query
for (int j = 0; j < resultFeed.getEntries().size(); j++) {
CalendarEventEntry entry = resultFeed.getEntries().get(j);
entry.delete();
}
PS: I've looked at the Data API Developer's Guide and the Google Data API Javadoc. These sites are okay, but not great. Does anyone know of additional Google API documentation?
You can increase the number of results with myQuery.setMaxResults(). There will be a maximum maximum though, so you can make multiple queries ('paged' results) by varying myQuery.setStartIndex().
http://code.google.com/apis/gdata/javadoc/com/google/gdata/client/Query.html#setMaxResults(int)
http://code.google.com/apis/gdata/javadoc/com/google/gdata/client/Query.html#setStartIndex(int)
Based on the answers from Jim Blackler and Chris Kaminski, I enhanced my code to read the query results in pages. I also do the delete as a batch, which should be faster than doing individual deletions.
I'm providing the Java code here in case it is useful to anyone.
CalendarQuery myQuery = new CalendarQuery(feedUrl);
DateFormat dfGoogle = new SimpleDateFormat("yyyy-MM-dd'T00:00:00'");
Date dt = Calendar.getInstance().getTime();
myQuery.setMinimumStartTime(DateTime.parseDateTime(dfGoogle.format(dt)));
// Make the end time far into the future so we delete everything
myQuery.setMaximumStartTime(DateTime.parseDateTime("2099-12-31T23:59:59"));
// Set the maximum number of results to return for the query.
// Note: A GData server may choose to provide fewer results, but will never provide
// more than the requested maximum.
myQuery.setMaxResults(5000);
int startIndex = 1;
int entriesReturned;
List<CalendarEventEntry> allCalEntries = new ArrayList<CalendarEventEntry>();
CalendarEventFeed resultFeed;
// Run our query as many times as necessary to get all the
// Google calendar entries we want
while (true) {
myQuery.setStartIndex(startIndex);
// Execute the query and get the response
resultFeed = service.query(myQuery, CalendarEventFeed.class);
entriesReturned = resultFeed.getEntries().size();
if (entriesReturned == 0)
// We've hit the end of the list
break;
// Add the returned entries to our local list
allCalEntries.addAll(resultFeed.getEntries());
startIndex = startIndex + entriesReturned;
}
// Delete all the entries as a batch delete
CalendarEventFeed batchRequest = new CalendarEventFeed();
for (int i = 0; i < allCalEntries.size(); i++) {
CalendarEventEntry entry = allCalEntries.get(i);
BatchUtils.setBatchId(entry, Integer.toString(i));
BatchUtils.setBatchOperationType(entry, BatchOperationType.DELETE);
batchRequest.getEntries().add(entry);
}
// Get the batch link URL and send the batch request
Link batchLink = resultFeed.getLink(Link.Rel.FEED_BATCH, Link.Type.ATOM);
CalendarEventFeed batchResponse = service.batch(new URL(batchLink.getHref()), batchRequest);
// Ensure that all the operations were successful
boolean isSuccess = true;
StringBuffer batchFailureMsg = new StringBuffer("These entries in the batch delete failed:");
for (CalendarEventEntry entry : batchResponse.getEntries()) {
String batchId = BatchUtils.getBatchId(entry);
if (!BatchUtils.isSuccess(entry)) {
isSuccess = false;
BatchStatus status = BatchUtils.getBatchStatus(entry);
batchFailureMsg.append("\nID: " + batchId + " Reason: " + status.getReason());
}
}
if (!isSuccess) {
throw new Exception(batchFailureMsg.toString());
}
There is a small quote on the API page
http://code.google.com/apis/calendar/data/1.0/reference.html#Parameters
Note: The max-results query parameter for Calendar is set to 25 by default,
so that you won't receive an entire
calendar feed by accident. If you want
to receive the entire feed, you can
specify a very large number for
max-results.
So to get all events from a google calendar feed, we do this:
google.calendarurl.com/.../basic?max-results=999999
in the API you can also query with setMaxResults=999999
I got here while searching for a Python solution;
Should anyone be stuck in the same way, the important line is the fourth:
query = gdata.calendar.service.CalendarEventQuery(cal, visibility, projection)
query.start_min = start_date
query.start_max = end_date
query.max_results = 1000
Unfortunately, Google is going to limit the maximum number of queries you can retrieve. This is so as to keep the query governor in their guidelines (HTTP requests not allowed to take more than 30 seconds, for example). They've built their whole architecture around this, so you might as well build the logic as you have.