How to mitigate users adding blank rows to a Google Spreadsheet and thus breaking ListFeed API? - gdata-api

We use Google Spreadsheets to collect research data and allow users to directly enter data into spreadsheets that have been pragmatically generated. This has been working fairly well until a user enters a blank line in between data rows! They may do this for readability or they may have deleted a row, anyway...
Google's documentation is clear on this:
https://developers.google.com/google-apps/spreadsheets/#retrieving_a_list-based_feed
The list feed contains all rows after the first row up to the first blank row.
So the problem is that I have 'harvester' scripts that rip through these spreadsheets, collecting up data for archival / local databasing. These scripts use the ListFeed, so they stop when reach a blank row and miss data!
The documentation suggests:
If expected data isn't appearing in a feed, check the worksheet manually to see whether there's an unexpected blank row in the middle of the data.
Manually! Gasp, I have hundreds of sheets :) Do you have suggestions for mitigating this situation other than yelling at users whenever I see this happen! Thank you

This is the only way that I think we can even get close with the spreadsheet API. This is NOT complete code, it's within a function I wrote but you get the drift... it's in C#:
Working with example:
--row 1 = header row
--row 2 = data
--row 3 = data
--row 4 = totally blank
--row 5 = data
--row 6-100 = totally blank
In English:
Get the worksheet's ListFeed.Entries.Count. ListFeeds ignore header row so in this example count would be "2".
Get the worksheet's CellFeed in order to cycle through the cells. CellFeeds DO include the header row as row 1, so in the example, from the perspective of a CellFeed, the first blank row must be row 4 (header=1, then 2 data rows, then first blank line which terminates the ListFeed set), therefore we should begin looking through cells at row 5 and beyond for any cell that is NOT empty:
foreach (WorksheetEntry entry in wsFeed.Entries)
{
//Get the worksheet CellFeed:
CellQuery cellQuery = new CellQuery(entry.CellFeedLink);
CellFeed cellFeed = service.Query(cellQuery);
//Get the worksheet ListFeed to compare with the CellFeed:
AtomLink listFeedLink = entry.Links.FindService(
GDataSpreadsheetsNameTable.ListRel, null
);
ListQuery listQuery = new ListQuery(listFeedLink.HRef.ToString());
//need to have service object already created for this... see API docs
ListFeed listFeed = service.Query(listQuery);
//Now to check if there is data after the ListFeed
//set which would indicate a blank line in the data set (not allowed)
foreach (CellEntry cell in cellFeed.Entries)
{
//start looking in cells in the row after what would be the first blank row
if (cell.Row > listFeed.Entries.Count + 2)
{
if (cell.Value != "")
{
MessageBox.Show("ERROR: There appears to be a blank row +
in the middle of the data set in worksheet: " +
entry.Title.Text + ". Completely blank rows " +
"are not allowed in between data rows. Each row " +
"within the data set must have at least one " +
"value in at least one cell. There CAN and " +
"should be blank rows after the data set at " +
"the bottom of the worksheet.");
return false;
}
}
}
}

Related

Google Sheets add a Permanent timestamp

I am setting up a sheet where a person will be able to check a checkbox, in different times, depending on the progress of a task. So, there are 5 checkboxes per row, for a number of different tasks.
Now, the idea is that, when you check one of those checkboxes, a message builds up in the few next cells coming after. So, the message is built in 3 cells. The first cell is just text, the second one is the date, and the third one is time. Also, those cells have 5 paragraphs each (one per checkbox).
The problem comes when I try to make that timestamp stay as it was when it was entered. As it is right now, the time changes every time I update any part of the Google Sheet.
I set u my formulas as follows:
For the text message:
=IF($C4=TRUE,"Insert text 1 here","")&CHAR(10)&IF($E4=TRUE, "Insert text here","")&CHAR(10)&IF($G4=TRUE, "Insert text 3 here","")&CHAR(10)&IF($I4=TRUE, "Insert text 4 here,"")&CHAR(10)&IF($K4=TRUE, "Insert text 5 here","")
For the date:
=IF($C4=TRUE,(TEXT(NOW(),"mmm dd yyyy")),"")&CHAR(10)&IF($E4=TRUE,(TEXT(NOW(),"mmm dd yyyy")),"")&CHAR(10)&IF($G4=TRUE,(TEXT(NOW(),"mmm dd yyyy")),"")&CHAR(10)&IF($I4=TRUE,(TEXT(NOW(),"mmm dd yyyy")),"")&CHAR(10)&IF($K4=TRUE,(TEXT(NOW(),"mmm dd yyyy")),"")
And for the time:
=IF($C4=TRUE,(TEXT(NOW(),"HH:mm")),"")&CHAR(10)&IF($E4=TRUE,(TEXT(NOW(),"HH:mm")),"")&CHAR(10)&IF($G4=TRUE,(TEXT(NOW(),"HH:mm")),"")&CHAR(10)&IF($I4=TRUE,(TEXT(NOW(),"HH:mm")),"")&CHAR(10)&IF($K4=TRUE,(TEXT(NOW(),"HH:mm")),"")
And it all looks like this:
I would appreciate it greatly if anyone could help me get this to work so that date and time are inserted after checking those boxes and they donĀ“t change again
Notice that your struggle with the continuous changing date time. I had the same struggle as yours over the year, and I found a solution that works for my case nicely. But it needs to be a little more "dirty work" with Apps Script
Some background for my case:
I have multiple sheets in the spreadsheet to run and generate the
timestamp
I want to skip my first sheet without running to generate timestamp
in it
I want every edit, even if each value that I paste from Excel to
generate timestamp
I want the timestamp to be individual, each row have their own
timestamp precise to every second
I don't want a total refresh of the entire sheet timestamp when I am
editing any other row
I have a column that is a MUST FILL value to justify whether the
timestamp needs to be generated for that particular row
I want to specify my timestamp on a dedicated column only
function timestamp() {
const ss = SpreadsheetApp.getActiveSpreadsheet();
const totalSheet = ss.getSheets();
for (let a=1; a<totalSheet.length; a++) {
let sheet = ss.getSheets()[a];
let range = sheet.getDataRange();
let values = range.getValues();
function autoCount() {
let rowCount;
for (let i = 0; i < values.length; i++) {
rowCount = i
if (values[i][0] === '') {
break;
}
}
return rowCount
}
rowNum = autoCount()
for(let j=1; j<rowNum+1; j++){
if (sheet.getRange(j+1,7).getValue() === '') {
sheet.getRange(j+1,7).setValue(new Date()).setNumberFormat("yyyy-MM-dd hh:mm:ss");
}
}
}
}
Explanation
First, I made a const totalSheet with getSheets() and run it
with a for loop. That is to identify the total number of sheets
inside that spreadsheet. Take note, in here, I made let a=1;
supposed all JavaScript the same, starts with 0, value 1 is to
skip the first sheet and run on the second sheet onwards
then, you will notice a function let sheet = ss.getSheets()[a]
inside the loop. Take note, it is not supposed to use const if
your value inside the variable is constantly changing, so use
let instead will work fine.
then, you will see a function autoCount(). That is to make a for
loop to count the number of rows that have values edited in it. The
if (values[i][0] === '') is to navigate the script to search
through the entire sheet that has value, looking at the row i and
the column 0. Here, the 0 is indicating the first column of the
sheet, and the i is the row of the sheet. Yes, it works like a
json object with panda feeling.
then, you found the number of rows that are edited by running the
autoCount(). Give it a rowNum variable to contain the result.
then, pass that rowNum into a new for loop, and use if (sheeet.getRange(j+1,7).getValue() === '') to determine which row
has not been edited with timestamp. Take note, where the 7 here
indicating the 7th column of the sheet is the place that I want a
timestamp.
inside the for loop, is to setValue with date in a specified
format of ("yyyy-MM-dd hh:mm:ss"). You are free to edit into any
style you like
ohya, do remember to deploy to activate the trigger with event type
as On Change. That is not limiting to edit, but for all kinds of
changes including paste.
Here's a screenshot on how it would look like:
Lastly, please take note on some of my backgrounds before deciding to or not to have the solution to work for your case. Cheers, and happy coding~!

Errors when grouping by list in Power Query

I have a set of unique items (Index) to each of which are associated various elements of another set of items (in this case, dates).
In real life, if a date is associated with an index, an item associated with that index appeared in a file generated on that date. For combination of dates that actually occurs, I want to know which accounts were present.
let
Source = Table.FromRecords({
[Idx = 0, Dates = {#date(2016,1,1), #date(2016,1,2), #date(2016,1,3)}],
[Idx = 1, Dates = {#date(2016,2,1), #date(2016,2,2), #date(2016,2,3)}],
[Idx = 2, Dates = {#date(2016,1,1), #date(2016,1,2), #date(2016,1,3)}]},
type table [Idx = number, Dates = {date}]),
// Group by
Grouped = Table.Group(Source, {"Dates"}, {{"Idx", each List.Combine({[Idx]}), type {number}}}),
// Clicking on the item in the top left corner generates this code:
Navigation = Grouped{[Dates={...}]}[Dates],
// Which returns this error: "Expression.Error: Value was not specified"
// My own code to reference the same value returns {0,2} as expected.
CorrectValue = Grouped{0}[Idx],
// If I re-make the table as below the above error does not occur.
ReMakeTable = Table.FromColumns(Table.ToColumns(Grouped), Table.ColumnNames(Grouped))
in ReMakeTable
It seems that I can use the results of this in my later work even without the Re-make (I just can't preview cells correctly), but I'd like to know if what's going on that causes the error and the odd code at the Navigation step, and why it disappears after the ReMakeTable step.
This happens because when you double click an item, the auto-generated code uses value filter instead of row index that you are using to get the single row from the table. And since you have a list as a value, it should be used instead of {...}. Probably UI isn't capable to work with lists in such a situation, and it inserts {...}, and this is indeed an incorrect value.
Thus, this line of code should look like:
Navigate = Grouped{[Dates = {#date(2016,1,1), #date(2016,1,2), #date(2016,1,3)}]}[Idx],
Then it will use value filter.
This is a bug in the UI. The index the UI calculates is incorrect: it should be 0 instead of [Dates={...}]. ... is a placeholder value, and it generates the "Value was not specified" exception if it is not replaced.

How can I more efficiently find the height of a table using Python

I am using openpyxl to copy data from an Excel spreadsheet. The data is a table for an inventory database, where each row is an entry in the database. I read the table one row at a time using a for loop. In order to determine the range of the for loop, I wrote a function that examines each cell in the table to find the height of the table.
Code:
def find_max(self, sheet, row, column):
max_row = 0
cell_top = sheet.cell(row = row - 1, column = column)
while cell_top.value != None:
cell = sheet.cell(row = row, column = column)
max = 0
while cell.value != None or sheet.cell(row = row + 1, column = column).value != None:
row += 1
max = max + 1
cell = sheet.cell(row = row, column = column)
if max > max_row:
max_row = max
cell_top = sheet.cell(row = row, column = column + 1)
return max_row
To summarize the function, I move to the next column in the worksheet and then iterate through every cell in that sheet, keeping track of its height until there are no more columns. The catch about this function is that it has to find two empty cells in a row in order to fail the condition. In a previous version I used a similar approach, but only used one column and stopped as soon as I found a blank cell. I had to change it so the program would still run if the user forgot to fill out a column. This function works okay for a small table, but on a table with several hundred entries this makes the program run much slower.
My question is this: What can I do to make this more efficient? I know nesting a while loop like that makes a program take longer but I do not see how to get around it. I have to make the program as foolproof as possible, so I need to check more than one column to stop user errors from failing the program
This is untested, but every time I've used openpyxl, I iterate over all rows like so:
for row in active_worksheet:
do_something_to(row)
so you could count like:
count = 0
for row in active_worksheet:
count += 1
EDIT: This is a better solution: Is it possible to get an Excel document's row count without loading the entire document into memory?
Read-only mode works row-by-row on the source so you probably want to hook it into it. Alternatively, you could pass the cells of the of a worksheet into something like a Pandas matrix which has indices for empty cells.

I can't seem to swap the location of parallel nodes/subtrees within a pugixml document....?

I need to re-sequence the majority of child nodes at one level within my document.
The document has a structure that looks (simplified) like this:
sheet
table
row
parameters
row
parameters
row
parameters
row
cell
header string
cell
header string
cell
header string
data row A
cell
data
cell
data
cell
data
data row B
cell
data
cell
data
cell
data
data row C
cell
data
cell
data
cell
data
data row D
cell
data
cell
data
cell
data
data row E
cell
data
cell
data
cell
data
row
parameters
row
parameters
row
parameters
row
parameters
row
parameters
I'm using pugixml now to load, parse, and traverse and access the large xml file, and I'm ultimately processing out a new sequence of the data rows. I know I'm parsing everything correctly and, looking at the resequence results, I can see that the reading and processing is correct. The resequence solution after all my optimizing and processing is a list of indicies in a revised order, like { D,A,E,C,B } for the example above. So now I need to actually resequence them into this new order and then output the resulting xml to a new file. The actual data is about 16 meg, with several hundred data element row nodes and more than a hundred data elements for each row
I've written a routine to swap two data rows, but something I'm doing is destroying the xml structural consistency during the swaps. I'm sure I don't understand the way pugi is moving nodes around and/or invalidating node handles.
I create and set aside node handles -- pugi::xml_node -- to the "table" level node, to the "header" row node, and to the "first data" row node, which in the original form above would be node "data row A". I know these handles give me correct access to the right data -- I can pause execution and look into them during the optimization and resequencing calculations and examine the rows and their siblings and see the input order.
The "header row" is always a particular child of the table, and the "first data row" is always the sibling immediately after the "header row". So I set these up when I load the file and check them for data consistency.
My understanding of node::insert_copy_before is this:
pugi:xml_node new_node_handle_in_document = parentnode.insert_copy_before( node_to_be_copied_to_child_of_parent , node_to_be_copied_nodes_next_sibling )
My understanding is that a deep recursive clone of node_to_be_copied_to_child_of_parent with all children and attributes will be inserted as the sibling immediately before node_to_be_copied_nodes_next_sibling, where both are children of parentnode.
Clearly, if node_to_be_copied_nodes_next_sibling is also the "first data row", then the node handle to the first data row may still be valid after the operation, but will no longer actually be a handle to the first data node. But will using insert_copy on the document force updates to individual node handles in the vicinity -- or not -- of the changes?
So let's look at the code I'm trying to make work:
// a method to switch data rows
bool switchDataRows( int iRow1 , int iRow2 )
{
// temp vars
int iloop;
// navigate to the first row and create a handle that can move along siblings until we find the target
pugi::xml_node xmnRow1 = m_xmnFirstDataRow;
for ( iloop = 0 ; iloop < iRow1 ; iloop++ )
xmnRow1 = xmnRow1.next_sibling();
// navigate to the second row and create another handle that can move along siblings until we find the target
pugi::xml_node xmnRow2 = m_xmnFirstDataRow;
for ( iloop = 0 ; iloop < iRow2 ; iloop++ )
xmnRow2 = xmnRow2.next_sibling();
// ok.... so now get convenient handles on the the locations of the two nodes by creating handles to the nodes AFTER each
pugi::xml_node xmnNodeAfterFirstNode = xmnRow1.next_sibling();
pugi::xml_node xmnNodeAfterSecondNode = xmnRow2.next_sibling();
// at this point I know all the handles I've created are pointing towards the intended data.
// now copy the second to the location before the first
pugi::xml_node xmnNewRow2 = m_xmnTableNode.insert_copy_before( xmnRow2 , xmnNodeAfterFirstNode );
// here's where my concern begins. Does this copy do what I want it to do, moving a copy of the second target row into the position under the table node
// as the child immediately before xmnNodeAfterFirstNode ? If it does, might this operation invalidate other handles to data row nodes? Are all bets off as
// soon as we do an insert/copy in a list of siblings, or will handles to other nodes in that list of children remain valid?
// now copy the first to the spot before the second
pugi::xml_node xmnNewRow1 = m_xmnTableNode.insert_copy_before( xmnRow1 , xmnNodeAfterSecondNode );
// clearly, if other handles to data row nodes have been invalidated by the first insert_copy, then these handles aren't any good any more...
// now delete the old rows
bool bDidRemoveRow1 = m_xmnTableNode.remove_child( xmnRow1 );
bool bDidRemoveRow2 = m_xmnTableNode.remove_child( xmnRow2 );
// this is my attempt to remove the original data row nodes after they've been copied to their new locations
// we have to update the first data row!!!!!
bool bDidRowUpdate = updateFirstDataRow(); // a routine that starts with the header row node and finds the first sibling, the first data row
// as before, if using the insert_copy methods result in many of the handles moving around, then I won't be able to base an update of the "first data row node"
// handle on the "known" handle to the header data row node.
// return the result
return( bDidRemoveRow2 && bDidRemoveRow1 && bDidRowUpdate );
}
As I said, this destroys the structural consistency of the resulting xml. I can save it, but nothing will read it except notepad. The table ends up being somewhat garbled. If I try to use my own program to read it, the reader reports an "element mismatch" error and refuses to load it, understandably.
So I'm doing one or more things wrong. What are they?

Tables got over-written

I want to loop thru a dbf and create word table for each record meeting the condition, and I got a one-page report with only the last rec in a single table. Look like all records are written to the same table. I tried to use n = n + 1 to place the variable as an element to the table
oTable = oDoc.tables[n]
But seems it only support numerical rather than variable ?
You have to add each table as you go, making sure to leave space in between them (because Word likes to combine tables).
You'll need something like this inside your loop:
* Assumes you start with oDoc pointing to the document,
* oRange set to an empty range at the beginning of the area where you want to add the tables,
* and that nRows and nCols give you the size of the table.
oTable = oDoc.Tables.Add(m.oRange, m.nRows, m.nCols)
oRange = oTable.Range()
oRange.Collapse(0)
oRange.InsertParagraphAfter()
oRange.Collapse(0)
After this code, you can use oTable to add the data you want to add. Then, on the next time through the loop, you're ready to add another table below the one you just filled.

Resources