What is faster: ScriptDb or SpreadsheetApp? - performance

Let's say I have a a script that iterates over a list of 400 objects.
Each object has anywhere from 1 to 10 properties.
Each property is a reasonable size string or a somewhat large integer.
Is there a significant difference in performance of saving these objects into ScriptDB vs saving them into Spreadsheet(w/o doing it in one bulk operation).

Executive Summary
Yes, there is a significant difference! Huge! And I have to admit that this experiment didn't turn out the way I expected.
With this amount of data, writing to a spreadsheet was always much faster than using ScriptDB.
These experiments support the assertions regarding bulk operations in the Google Apps Script Best Practices. Saving data in a spreadsheet using a single setValues() call was 75% faster than line-by-line, and two orders of magnitude faster than cell-by-cell.
On the other hand, recommendations to use Spreadsheet.flush() should be considered carefully, due to the performance impact. In these experiments, a single write of a 4000-cell spreadsheet took less than 50ms, and adding a call to flush() increased that to 610ms - still less than a second, but an order of magnitude tax seems ludicrous. Calling flush() for each of the 400 rows in the sample spreadsheet made the operation take almost 12 seconds, when it took just 164 ms without it. If you've been experiencing Exceeded maximum execution time errors, you may benefit from both optimizing your code AND removing calls to flush().
Experimental Results
All timings were derived following the technique described in How to measure time taken by a function to execute. Times are expressed in milliseconds.
Here are the results from a single pass of five different approaches, two using ScriptDB, three writing to Spreadsheets, all with the same source data. (400 objects with 5 String & 5 Number attributes)
Experiment 1
Elapsed time for ScriptDB/Object test: 53529
Elapsed time for ScriptDB/Batch test: 37700
Elapsed time for Spreadsheet/Object test: 145
Elapsed time for Spreadsheet/Attribute test: 4045
Elapsed time for Spreadsheet/Bulk test: 32
Effect of Spreadsheet.flush()
Experiment 2
In this experiment, the only difference from Experiment 1 was that we called Spreadsheet.flush() after every setValue/s call. The cost of doing so is dramatic, (around 700%) but does not change the recommendation to use a spreadsheet over ScriptDB for speed reasons, because writing to spreadsheets is still faster.
Elapsed time for ScriptDB/Object test: 55282
Elapsed time for ScriptDB/Batch test: 37370
Elapsed time for Spreadsheet/Object test: 11888
Elapsed time for Spreadsheet/Attribute test: 117388
Elapsed time for Spreadsheet/Bulk test: 610
Note: This experiment was often killed with Exceeded maximum execution time.
Caveat Emptor
You're reading this on the interwebs, so it must be true! But take it with a grain of salt.
These are results from very small sample sizes, and may not be completely reproducible.
These results are measuring something that changes constantly - while they were observed on Feb 28 2013, the system they measured could be completely different when you read this.
The efficiency of these operations is affected by many factors that are not controlled in these experiments; caching of instructions & intermediate results and server load, for example.
Maybe, just maybe, someone at Google will read this, and improve the efficiency of ScriptDB!
The Code
If you want to perform (or better yet, improve) these experiments, create a blank spreadsheet, and copy this into a new script within it. This is also available as a gist.
/**
* Run experiments to measure speed of various approaches to saving data in
* Google App Script (GAS).
*/
function testSpeed() {
var numObj = 400;
var numAttr = 10;
var doFlush = false; // Set true to activate calls to SpreadsheetApp.flush()
var arr = buildArray(numObj,numAttr);
var start, stop; // time catchers
var db = ScriptDb.getMyDb();
var sheet;
// Save into ScriptDB, Object at a time
deleteAll(); // Clear ScriptDB
start = new Date().getTime();
for (var i=1; i<=numObj; i++) {
db.save({type: "myObj", data:arr[i]});
}
stop = new Date().getTime();
Logger.log("Elapsed time for ScriptDB/Object test: " + (stop - start));
// Save into ScriptDB, Batch
var items = [];
// Restructure data - this is done outside the timed loop, assuming that
// the data would not be in an array if we were using this approach.
for (var obj=1; obj<=numObj; obj++) {
var thisObj = new Object();
for (var attr=0; attr < numAttr; attr++) {
thisObj[arr[0][attr]] = arr[obj][attr];
}
items.push(thisObj);
}
deleteAll(); // Clear ScriptDB
start = new Date().getTime();
db.saveBatch(items, false);
stop = new Date().getTime();
Logger.log("Elapsed time for ScriptDB/Batch test: " + (stop - start));
// Save into Spreadsheet, Object at a time
sheet = SpreadsheetApp.getActive().getActiveSheet().clear();
start = new Date().getTime();
for (var row=0; row<=numObj; row++) {
var values = [];
values.push(arr[row]);
sheet.getRange(row+1, 1, 1, numAttr).setValues(values);
if (doFlush) SpreadsheetApp.flush();
}
stop = new Date().getTime();
Logger.log("Elapsed time for Spreadsheet/Object test: " + (stop - start));
// Save into Spreadsheet, Attribute at a time
sheet = SpreadsheetApp.getActive().getActiveSheet().clear();
start = new Date().getTime();
for (var row=0; row<=numObj; row++) {
for (var cell=0; cell<numAttr; cell++) {
sheet.getRange(row+1, cell+1, 1, 1).setValue(arr[row][cell]);
if (doFlush) SpreadsheetApp.flush();
}
}
stop = new Date().getTime();
Logger.log("Elapsed time for Spreadsheet/Attribute test: " + (stop - start));
// Save into Spreadsheet, Bulk
sheet = SpreadsheetApp.getActive().getActiveSheet().clear();
start = new Date().getTime();
sheet.getRange(1, 1, numObj+1, numAttr).setValues(arr);
if (doFlush) SpreadsheetApp.flush();
stop = new Date().getTime();
Logger.log("Elapsed time for Spreadsheet/Bulk test: " + (stop - start));
}
/**
* Create a two-dimensional array populated with 'numObj' rows of 'numAttr' cells.
*/
function buildArray(numObj,numAttr) {
numObj = numObj | 400;
numAttr = numAttr | 10;
var array = [];
for (var obj = 0; obj <= numObj; obj++) {
array[obj] = [];
for (var attr = 0; attr < numAttr; attr++) {
var value;
if (obj == 0) {
// Define attribute names / column headers
value = "Attr"+attr;
}
else {
value = ((attr % 2) == 0) ? "This is a reasonable sized string for testing purposes, not too long, not too short." : Number.MAX_VALUE;
}
array[obj].push(value);
}
}
return array
}
function deleteAll() {
var db = ScriptDb.getMyDb();
while (true) {
var result = db.query({}); // get everything, up to limit
if (result.getSize() == 0) {
break;
}
while (result.hasNext()) {
var item = result.next()
db.remove(item);
}
}
}

ScriptDB has been deprecated. Do not use.

Related

How can I speed up the performance Spring Jpa?

The code
int offset = 0;
int bulkSize = eodFileConfig.getBulkSize(); // sample of 10k
setDateTimeFormat();
//Get total record from Temp table
long max = extQRMerchantTrxHistService.getTotalRecords();
do {
log.debug("[execute] start to write to actual table");
// bulk size represent how many items in a page, offset is the page
Page<ExtensionQRMerchantTrxHistEntity> records = extQRMerchantTrxHistService.findRecordsWithPagination(offset, bulkSize);
List<TransactionHistoryExtEntity> transactions = new ArrayList<>();
for (ExtensionQRMerchantTrxHistEntity tempEntity : records) {
log.debug("Record: {} ", tempEntity);
Date a = new Date();
//Query from T_TRXN_DETAIL_EXT
Date dateTime = sf.parse(tempEntity.getTransactionDate());
List<TransactionHistoryExtEntity> histories =
transactionHistoryInquiryService.retrieveHistoryBasedOnRefNoDateAmt(
tempEntity.getTransactionRefNo(), dateTime, tempEntity.getTransactionAmount());
Date b = new Date();
System.out.println("Query Time: " + Math.abs(a.getTime() - b.getTime()));
Date c = new Date();
TransactionHistoryExtEntity transaction;
if (histories.isEmpty()) {
//Insert record
transaction = setTransactionHistory(Boolean.TRUE, tempEntity, null);
} else {
//Update record
transaction = setTransactionHistory(Boolean.FALSE, tempEntity, histories.get(0));
}
Date d = new Date();
System.out.println("Query Time: " + Math.abs(c.getTime() - d.getTime()));
Date e = new Date();
//transactionHistoryExtRepository.saveAndFlush(transaction);
Date f = new Date();
System.out.println("Query Time: " + Math.abs(e.getTime() - f.getTime()));
//Add to list
transactions.add(transaction);
}
//Save & Update all records
transactionHistoryExtRepository.saveAll(transactions);
offset++;
} while ((long) offset * bulkSize < max);
The query
List<TransactionHistoryExtEntity> findTopByReferenceNumberAndTransactionDateAndAmountOrderByTransactionDateDesc(
String referenceNo, Date transactionDate, BigDecimal amount);
I am a bit new with this spring boot stuff. I am trying to insert/update 50k-100k of records into the table. Running 10k seems fast enough however as the size increases the time it takes for the query part in the inner for loop increases. For a record of 50k with 10k bulksize, the first 10k took about 80 seconds to complete (the entire inner iteration), followed by 472 seconds for the next 10k while the last 10k took 1k+ seconds to process. Can anyone explain what is causing the issue? Also, I cleared my table before executing this, meaning it is empty when I run this.
As a result, I tried different bulksize such as 30k and 50k. At 30k bulksize, which means it will be 2 outer loops, it completes everything at about 38mins, 50k bulksize at about 33 mins whereas 10k at an hour for 50k records. But this would defeat the purpose of having pagination at the start.
So I noticed that the avg time to query spiked after saveAll method. I am not too sure if it is due to the saveAll or the size of the records. Without the inner loop query part, the entire process just takes about 1 mins to complete.
Does anyone have any idea regarding the issue and how I can increase the performance?

Why is there consistent variation in execution time on my timed trigger?

I have a timed trigger that runs every 15 minutes. A simplified partial version of the script is shown below. The script compiles data from about 50 other spreadsheets and records a row for each spreadsheet, then writes that summary data to the active spreadsheet.
I noticed that in the logs, there is an alternating pattern in the execution times for this script: half the executions take 200-400 seconds, and the other half typically take 700-900 seconds. It's a pretty significant difference, and the pattern persists over the past several days of logs.
There's nothing in the script itself that changes from one execution to the next, so I'm curious if anyone can suggest a reason this would happen (even better if it's a documented reason). For example, is there some sort of caching of the spreadsheet reads so that the next execution gets those values faster?
// The triggered function.
function updateRankings()
{
var rankingSheet = SS.getSheetByName(RANKING_SHEET_NAME) // SS is the active spreadsheet
// Read the id's of the target spreadsheets, which are stored on an external spreadsheet
var gyms = getRowsData( SpreadsheetApp.openById(ADMIN_PANEL_ID).getSheetByName(ADMIN_PANEL_SHEET_NAME))
// Iterate over gyms
gyms.forEach(getGymStats)
// Write the compiled data back to the active sheet
setRowsData(rankingSheet, gyms)
}
function getGymStats(gym)
{
var gymSpreadsheet = SpreadsheetApp.openById(gym.spreadsheetId)
// Force spreadsheet formulas to calculate before reading values
SpreadsheetApp.flush()
var metricsSheet = gymSpreadsheet.getSheetByName('Detailed Metrics')
var statsColumn = metricsSheet.getRange('E:E').getValues()
var roasColumn = metricsSheet.getRange('J:J').getValues()
// Get stats
var gymStats = {
facebookAdSpend: getFacebookAdSpend(gymSpreadsheet),
scheduling: statsColumn[8][0],
showup: statsColumn[9][0],
closing: statsColumn[10][0],
costPerLead: statsColumn[25][0],
costPerAppointment: statsColumn[26][0],
costPerShow: statsColumn[27][0],
costPerAcquisition: statsColumn[28][0],
leadCount: statsColumn[13][0],
frontEndRoas: (roasColumn[21][0] / statsColumn[5][0]) || 0,
totalRoas: (roasColumn[35][0] / statsColumn[5][0]) || 0,
totalProjectedRoas: (roasColumn[36][0] / statsColumn[5][0]) || 0,
conversionRate: (gym.currency ?
'=IFS(ISBLANK(INDIRECT("R[0]C[-4]", FALSE)),,ISBLANK(INDIRECT("R[0]C[-2]", FALSE)), 1,TRUE, IFERROR(GOOGLEFINANCE("Currency:"&INDIRECT("R[0]C[-2]", FALSE)&"USD")))' :
1)
}
Object.assign(gym, gymStats)
}
function getFacebookAdSpend(spreadsheet)
{
var range = spreadsheet.getRangeByName('FacebookAdSpend')
if (!range) return ''
return range.getValue()
}

Google calendar query returns at most 25 entries

I'm trying to delete all calendar entries from today forward. I run a query then call getEntries() on the query result. getEntries() always returns 25 entries (or less if there are fewer than 25 entries on the calendar). Why aren't all the entries returned? I'm expecting about 80 entries.
As a test, I tried running the query, deleting the 25 entries returned, running the query again, deleting again, etc. This works, but there must be a better way.
Below is the Java code that only runs the query once.
CalendarQuery myQuery = new CalendarQuery(feedUrl);
DateFormat dfGoogle = new SimpleDateFormat("yyyy-MM-dd'T00:00:00'");
Date dt = Calendar.getInstance().getTime();
myQuery.setMinimumStartTime(DateTime.parseDateTime(dfGoogle.format(dt)));
// Make the end time far into the future so we delete everything
myQuery.setMaximumStartTime(DateTime.parseDateTime("2099-12-31T23:59:59"));
// Execute the query and get the response
CalendarEventFeed resultFeed = service.query(myQuery, CalendarEventFeed.class);
// !!! This returns 25 (or less if there are fewer than 25 entries on the calendar) !!!
int test = resultFeed.getEntries().size();
// Delete all the entries returned by the query
for (int j = 0; j < resultFeed.getEntries().size(); j++) {
CalendarEventEntry entry = resultFeed.getEntries().get(j);
entry.delete();
}
PS: I've looked at the Data API Developer's Guide and the Google Data API Javadoc. These sites are okay, but not great. Does anyone know of additional Google API documentation?
You can increase the number of results with myQuery.setMaxResults(). There will be a maximum maximum though, so you can make multiple queries ('paged' results) by varying myQuery.setStartIndex().
http://code.google.com/apis/gdata/javadoc/com/google/gdata/client/Query.html#setMaxResults(int)
http://code.google.com/apis/gdata/javadoc/com/google/gdata/client/Query.html#setStartIndex(int)
Based on the answers from Jim Blackler and Chris Kaminski, I enhanced my code to read the query results in pages. I also do the delete as a batch, which should be faster than doing individual deletions.
I'm providing the Java code here in case it is useful to anyone.
CalendarQuery myQuery = new CalendarQuery(feedUrl);
DateFormat dfGoogle = new SimpleDateFormat("yyyy-MM-dd'T00:00:00'");
Date dt = Calendar.getInstance().getTime();
myQuery.setMinimumStartTime(DateTime.parseDateTime(dfGoogle.format(dt)));
// Make the end time far into the future so we delete everything
myQuery.setMaximumStartTime(DateTime.parseDateTime("2099-12-31T23:59:59"));
// Set the maximum number of results to return for the query.
// Note: A GData server may choose to provide fewer results, but will never provide
// more than the requested maximum.
myQuery.setMaxResults(5000);
int startIndex = 1;
int entriesReturned;
List<CalendarEventEntry> allCalEntries = new ArrayList<CalendarEventEntry>();
CalendarEventFeed resultFeed;
// Run our query as many times as necessary to get all the
// Google calendar entries we want
while (true) {
myQuery.setStartIndex(startIndex);
// Execute the query and get the response
resultFeed = service.query(myQuery, CalendarEventFeed.class);
entriesReturned = resultFeed.getEntries().size();
if (entriesReturned == 0)
// We've hit the end of the list
break;
// Add the returned entries to our local list
allCalEntries.addAll(resultFeed.getEntries());
startIndex = startIndex + entriesReturned;
}
// Delete all the entries as a batch delete
CalendarEventFeed batchRequest = new CalendarEventFeed();
for (int i = 0; i < allCalEntries.size(); i++) {
CalendarEventEntry entry = allCalEntries.get(i);
BatchUtils.setBatchId(entry, Integer.toString(i));
BatchUtils.setBatchOperationType(entry, BatchOperationType.DELETE);
batchRequest.getEntries().add(entry);
}
// Get the batch link URL and send the batch request
Link batchLink = resultFeed.getLink(Link.Rel.FEED_BATCH, Link.Type.ATOM);
CalendarEventFeed batchResponse = service.batch(new URL(batchLink.getHref()), batchRequest);
// Ensure that all the operations were successful
boolean isSuccess = true;
StringBuffer batchFailureMsg = new StringBuffer("These entries in the batch delete failed:");
for (CalendarEventEntry entry : batchResponse.getEntries()) {
String batchId = BatchUtils.getBatchId(entry);
if (!BatchUtils.isSuccess(entry)) {
isSuccess = false;
BatchStatus status = BatchUtils.getBatchStatus(entry);
batchFailureMsg.append("\nID: " + batchId + " Reason: " + status.getReason());
}
}
if (!isSuccess) {
throw new Exception(batchFailureMsg.toString());
}
There is a small quote on the API page
http://code.google.com/apis/calendar/data/1.0/reference.html#Parameters
Note: The max-results query parameter for Calendar is set to 25 by default,
so that you won't receive an entire
calendar feed by accident. If you want
to receive the entire feed, you can
specify a very large number for
max-results.
So to get all events from a google calendar feed, we do this:
google.calendarurl.com/.../basic?max-results=999999
in the API you can also query with setMaxResults=999999
I got here while searching for a Python solution;
Should anyone be stuck in the same way, the important line is the fourth:
query = gdata.calendar.service.CalendarEventQuery(cal, visibility, projection)
query.start_min = start_date
query.start_max = end_date
query.max_results = 1000
Unfortunately, Google is going to limit the maximum number of queries you can retrieve. This is so as to keep the query governor in their guidelines (HTTP requests not allowed to take more than 30 seconds, for example). They've built their whole architecture around this, so you might as well build the logic as you have.

Linq Efficiency question - foreach vs aggregates

Which is more efficient?
//Option 1
foreach (var q in baseQuery)
{
m_TotalCashDeposit += q.deposit.Cash
m_TotalCheckDeposit += q.deposit.Check
m_TotalCashWithdrawal += q.withdraw.Cash
m_TotalCheckWithdrawal += q.withdraw.Check
}
//Option 2
m_TotalCashDeposit = baseQuery.Sum(q => q.deposit.Cash);
m_TotalCheckDeposit = baseQuery.Sum(q => q.deposit.Check);
m_TotalCashWithdrawal = baseQuery.Sum(q => q.withdraw.Cash);
m_TotalCheckWithdrawal = baseQuery.Sum(q => q.withdraw.Check);
I guess what I'm asking is, calling Sum will basically enumerate over the list right? So if I call Sum four times, isn't that enumerating over the list four times? Wouldn't it be more efficient to just do a foreach instead so I only have to enumerate the list once?
It might, and it might not, it depends.
The only sure way to know is to actually measure it.
To do that, use BenchmarkDotNet, here's an example which you can run in LINQPad or a console application:
void Main()
{
BenchmarkSwitcher.FromAssembly(GetType().Assembly).RunAll();
}
public class Benchmarks
{
[Benchmark]
public void Option1()
{
// foreach (var q in baseQuery)
// {
// m_TotalCashDeposit += q.deposit.Cash;
// m_TotalCheckDeposit += q.deposit.Check;
// m_TotalCashWithdrawal += q.withdraw.Cash;
// m_TotalCheckWithdrawal += q.withdraw.Check;
// }
}
[Benchmark]
public void Option2()
{
// m_TotalCashDeposit = baseQuery.Sum(q => q.deposit.Cash);
// m_TotalCheckDeposit = baseQuery.Sum(q => q.deposit.Check);
// m_TotalCashWithdrawal = baseQuery.Sum(q => q.withdraw.Cash);
// m_TotalCheckWithdrawal = baseQuery.Sum(q => q.withdraw.Check);
}
}
BenchmarkDotNet is a powerful library for measuring performance, and is much more accurate than simply using Stopwatch, as it will use statistically correct approaches and methods, and also take such things as JITting and GC into account.
Now that I'm older and wiser I no longer belive using Stopwatch is a good way to measure performance. I won't remove the old answer, as google and similar links may lead people here looking for how to use Stopwatch to measure performance, but I hope I have added a better approach above.
Original answer below
Simple code to measure it:
Stopwatch sw = new Stopwatch();
sw.Start();
// your code here
sw.Stop();
Debug.WriteLine("Time taken: " + sw.ElapsedMilliseconds + " ms");
sw.Reset(); // in case you have more code below that reuses sw
You should run the code multiple times to avoid having JITting having too large an effect on your timings.
I went ahead and profiled this and found that you are correct.
Each Sum() effectively creates its own loop. In my simulation, I had it sum SQL dataset with 20319 records, each with 3 summable fields and found that creating your own loop had a 2X advantage.
I had hoped that LINQ would optimize this away and push the whole burden on the SQL server, but unless I move the sum request into the initial LINQ statement, it executes each request one at a time.

Paging a collection with LINQ

How do you page through a collection in LINQ given that you have a startIndex and a count?
It is very simple with the Skip and Take extension methods.
var query = from i in ideas
select i;
var paggedCollection = query.Skip(startIndex).Take(count);
A few months back I wrote a blog post about Fluent Interfaces and LINQ which used an Extension Method on IQueryable<T> and another class to provide the following natural way of paginating a LINQ collection.
var query = from i in ideas
select i;
var pagedCollection = query.InPagesOf(10);
var pageOfIdeas = pagedCollection.Page(2);
You can get the code from the MSDN Code Gallery Page: Pipelines, Filters, Fluent API and LINQ to SQL.
I solved this a bit differently than what the others have as I had to make my own paginator, with a repeater. So I first made a collection of page numbers for the collection of items that I have:
// assumes that the item collection is "myItems"
int pageCount = (myItems.Count + PageSize - 1) / PageSize;
IEnumerable<int> pageRange = Enumerable.Range(1, pageCount);
// pageRange contains [1, 2, ... , pageCount]
Using this I could easily partition the item collection into a collection of "pages". A page in this case is just a collection of items (IEnumerable<Item>). This is how you can do it using Skip and Take together with selecting the index from the pageRange created above:
IEnumerable<IEnumerable<Item>> pageRange
.Select((page, index) =>
myItems
.Skip(index*PageSize)
.Take(PageSize));
Of course you have to handle each page as an additional collection but e.g. if you're nesting repeaters then this is actually easy to handle.
The one-liner TLDR version would be this:
var pages = Enumerable
.Range(0, pageCount)
.Select((index) => myItems.Skip(index*PageSize).Take(PageSize));
Which can be used as this:
for (Enumerable<Item> page : pages)
{
// handle page
for (Item item : page)
{
// handle item in page
}
}
This question is somewhat old, but I wanted to post my paging algorithm that shows the whole procedure (including user interaction).
const int pageSize = 10;
const int count = 100;
const int startIndex = 20;
int took = 0;
bool getNextPage;
var page = ideas.Skip(startIndex);
do
{
Console.WriteLine("Page {0}:", (took / pageSize) + 1);
foreach (var idea in page.Take(pageSize))
{
Console.WriteLine(idea);
}
took += pageSize;
if (took < count)
{
Console.WriteLine("Next page (y/n)?");
char answer = Console.ReadLine().FirstOrDefault();
getNextPage = default(char) != answer && 'y' == char.ToLowerInvariant(answer);
if (getNextPage)
{
page = page.Skip(pageSize);
}
}
}
while (getNextPage && took < count);
However, if you are after performance, and in production code, we're all after performance, you shouldn't use LINQ's paging as shown above, but rather the underlying IEnumerator to implement paging yourself. As a matter of fact, it is as simple as the LINQ-algorithm shown above, but more performant:
const int pageSize = 10;
const int count = 100;
const int startIndex = 20;
int took = 0;
bool getNextPage = true;
using (var page = ideas.Skip(startIndex).GetEnumerator())
{
do
{
Console.WriteLine("Page {0}:", (took / pageSize) + 1);
int currentPageItemNo = 0;
while (currentPageItemNo++ < pageSize && page.MoveNext())
{
var idea = page.Current;
Console.WriteLine(idea);
}
took += pageSize;
if (took < count)
{
Console.WriteLine("Next page (y/n)?");
char answer = Console.ReadLine().FirstOrDefault();
getNextPage = default(char) != answer && 'y' == char.ToLowerInvariant(answer);
}
}
while (getNextPage && took < count);
}
Explanation: The downside of using Skip() for multiple times in a "cascading manner" is, that it will not really store the "pointer" of the iteration, where it was last skipped. - Instead the original sequence will be front-loaded with skip calls, which will lead to "consuming" the already "consumed" pages over and over again. - You can prove that yourself, when you create the sequence ideas so that it yields side effects. -> Even if you have skipped 10-20 and 20-30 and want to process 40+, you'll see all side effects of 10-30 being executed again, before you start iterating 40+.
The variant using IEnumerable's interface directly, will instead remember the position of the end of the last logical page, so no explicit skipping is needed and side effects won't be repeated.

Resources