I need to find a faster solution to iterate rows in Google App Script - algorithm

I'm trying to save some rows values for multiple columns on multiple tabs in GAS, but it's taking a lot of time and I'd like to find a faster way of doing this, if there's any.
A project e.g:'Project1' -as a key- has a value associated with it which corresponds to the column where it's stored, the tabs are 600+ iterations long.
this script opens up a tab called 'person1' at first and goes through all the rows for the column that corresponds to that project in 'projects' dictionary (it's the same format for every tab, but more projects will be added in the future)
right now i'm iterating through the 'members' dictionary (length=m), then through the projects dictionary (length=p) and finally through the length of the rows (length='r'), in the meantime it access the other spreadsheet where I want to save all those rows.
This means that the current time complexity of my algorithm is O(mpr) and it's WAY too slow.
for 15 people and 6 projects each, the amount of iterations would be 156600+ = 54,000 iterations at least (more people and more projects and more rows will be added).
is there any way to make my algorithm faster?
const members = {'Person1':'P1', 'Person2':'P2'};
const projects = {'Project1':'L','Project2':'R'}
function saveRowValue() {
let sourceSpreadsheet = SpreadsheetApp.getActiveSpreadsheet();
let targetSpreadsheet = SpreadsheetApp.openById('-SPREADSHEET-');
let targetSheet = targetSpreadsheet.getSheetByName('Tracking time');
let rowsToWrite = [];
rowsToWrite.push(['Project', 'Initials', 'Date', 'Tracking time'])
var rowsToSave = 1;
for(m in members){
Logger.log(m +' initials:'+ members[m]);
let sourceSheet = sourceSpreadsheet.getSheetByName(m);
for(p in projects){
let values = sourceSheet.getRange(projects[p]+"1:"+projects[p]).getValues();
Logger.log(values)
let list = [null, 0,''];
for(var i=0; i<values.length; i++){
try{
date = sourceSheet.getRange('B'+i).getValue();
let val = sourceSheet.getRange(projects[p]+i)
val = Utilities.formatDate(val.getValue(), "GMT", val.getNumberFormat())
Logger.log(val);
if(!(list.includes(val)) && date instanceof Date){
//rowsToWrite.push();
rowsToSave++;
targetSheet.getRange(rowsToSave,1,1,4).setValues([[p, members[m], date, val]]);
}
}catch(e){
Logger.log(e)
}
}
}
}
Logger.log(rowsToWrite);
[Here you can see how much time it takes to iterate 600 rows for a single project and a single member after changing what Yuri Khristich told me to change][1]
[1]: https://i.stack.imgur.com/CnRZY.png

First step is to try to get rid of getValue() and setValue() in loops. All data should be captured at once as 2D arrays in one step and put on the sheet in one step as well. No single cell or single row operations.
Next trick depends on your workflow. Say, it's unlikely that every time all 54000+ cells need to be checked. Probably there are ranges that have no changes. You can figure out some way to indicate the changes. And process only the changed ranges. Probably, the indication could be performed with onChange() trigger. For example you can add * to the name of the sheets and columns where changes have occurred and remove these * whenever you run your script.
Reference:
Use batch operations

Related

Google Sheets add a Permanent timestamp

I am setting up a sheet where a person will be able to check a checkbox, in different times, depending on the progress of a task. So, there are 5 checkboxes per row, for a number of different tasks.
Now, the idea is that, when you check one of those checkboxes, a message builds up in the few next cells coming after. So, the message is built in 3 cells. The first cell is just text, the second one is the date, and the third one is time. Also, those cells have 5 paragraphs each (one per checkbox).
The problem comes when I try to make that timestamp stay as it was when it was entered. As it is right now, the time changes every time I update any part of the Google Sheet.
I set u my formulas as follows:
For the text message:
=IF($C4=TRUE,"Insert text 1 here","")&CHAR(10)&IF($E4=TRUE, "Insert text here","")&CHAR(10)&IF($G4=TRUE, "Insert text 3 here","")&CHAR(10)&IF($I4=TRUE, "Insert text 4 here,"")&CHAR(10)&IF($K4=TRUE, "Insert text 5 here","")
For the date:
=IF($C4=TRUE,(TEXT(NOW(),"mmm dd yyyy")),"")&CHAR(10)&IF($E4=TRUE,(TEXT(NOW(),"mmm dd yyyy")),"")&CHAR(10)&IF($G4=TRUE,(TEXT(NOW(),"mmm dd yyyy")),"")&CHAR(10)&IF($I4=TRUE,(TEXT(NOW(),"mmm dd yyyy")),"")&CHAR(10)&IF($K4=TRUE,(TEXT(NOW(),"mmm dd yyyy")),"")
And for the time:
=IF($C4=TRUE,(TEXT(NOW(),"HH:mm")),"")&CHAR(10)&IF($E4=TRUE,(TEXT(NOW(),"HH:mm")),"")&CHAR(10)&IF($G4=TRUE,(TEXT(NOW(),"HH:mm")),"")&CHAR(10)&IF($I4=TRUE,(TEXT(NOW(),"HH:mm")),"")&CHAR(10)&IF($K4=TRUE,(TEXT(NOW(),"HH:mm")),"")
And it all looks like this:
I would appreciate it greatly if anyone could help me get this to work so that date and time are inserted after checking those boxes and they donĀ“t change again
Notice that your struggle with the continuous changing date time. I had the same struggle as yours over the year, and I found a solution that works for my case nicely. But it needs to be a little more "dirty work" with Apps Script
Some background for my case:
I have multiple sheets in the spreadsheet to run and generate the
timestamp
I want to skip my first sheet without running to generate timestamp
in it
I want every edit, even if each value that I paste from Excel to
generate timestamp
I want the timestamp to be individual, each row have their own
timestamp precise to every second
I don't want a total refresh of the entire sheet timestamp when I am
editing any other row
I have a column that is a MUST FILL value to justify whether the
timestamp needs to be generated for that particular row
I want to specify my timestamp on a dedicated column only
function timestamp() {
const ss = SpreadsheetApp.getActiveSpreadsheet();
const totalSheet = ss.getSheets();
for (let a=1; a<totalSheet.length; a++) {
let sheet = ss.getSheets()[a];
let range = sheet.getDataRange();
let values = range.getValues();
function autoCount() {
let rowCount;
for (let i = 0; i < values.length; i++) {
rowCount = i
if (values[i][0] === '') {
break;
}
}
return rowCount
}
rowNum = autoCount()
for(let j=1; j<rowNum+1; j++){
if (sheet.getRange(j+1,7).getValue() === '') {
sheet.getRange(j+1,7).setValue(new Date()).setNumberFormat("yyyy-MM-dd hh:mm:ss");
}
}
}
}
Explanation
First, I made a const totalSheet with getSheets() and run it
with a for loop. That is to identify the total number of sheets
inside that spreadsheet. Take note, in here, I made let a=1;
supposed all JavaScript the same, starts with 0, value 1 is to
skip the first sheet and run on the second sheet onwards
then, you will notice a function let sheet = ss.getSheets()[a]
inside the loop. Take note, it is not supposed to use const if
your value inside the variable is constantly changing, so use
let instead will work fine.
then, you will see a function autoCount(). That is to make a for
loop to count the number of rows that have values edited in it. The
if (values[i][0] === '') is to navigate the script to search
through the entire sheet that has value, looking at the row i and
the column 0. Here, the 0 is indicating the first column of the
sheet, and the i is the row of the sheet. Yes, it works like a
json object with panda feeling.
then, you found the number of rows that are edited by running the
autoCount(). Give it a rowNum variable to contain the result.
then, pass that rowNum into a new for loop, and use if (sheeet.getRange(j+1,7).getValue() === '') to determine which row
has not been edited with timestamp. Take note, where the 7 here
indicating the 7th column of the sheet is the place that I want a
timestamp.
inside the for loop, is to setValue with date in a specified
format of ("yyyy-MM-dd hh:mm:ss"). You are free to edit into any
style you like
ohya, do remember to deploy to activate the trigger with event type
as On Change. That is not limiting to edit, but for all kinds of
changes including paste.
Here's a screenshot on how it would look like:
Lastly, please take note on some of my backgrounds before deciding to or not to have the solution to work for your case. Cheers, and happy coding~!

Azure Search Scoring Profile Magnitude by Downloads

I am new to Azure Search so I just want to run this by before I try to implement it. We have a search setup on items and we want to score/rank the results based on its initial score and how many times the item has been used/downloaded. We want the items downloaded the most to appear at the top of the result list.
We have a separate field in the search index that contains the used/download count (itemCount).
I know I have to set up a Magnitude profile but I am not sure what to use for the range as the itemCount can contain 0 - N So do I just set the range to be some large number i.e. 100,000,000 or what is the best practice?
var functionRankByDownload = new MagnitudeFunction()
{
Boost = 1000,
BoostingRangeStart = 0,
BoostingRangeEnd = 100000000,
ConstantBoostBeyondRange = true,
FieldName = "itemCount",
Interpolation = InterpolationTypes.Linear
};
scoringProfile1.Functions = new List() { functionRankByDownload };
I found the score calculation is as follows:
((initialScore * boost * itemCount) - min) / (max-min)
So it seems like it should work ok having a large value for the max but again just wanting to know the best practice.
Thanks!
That seems reasonable. The BoostingRangeEnd can be any reasonable bound to your range depending on the scenario. Since, you are using ConstantBoostBeyondRange, it would also take care of boosting values outside ranges appropriately.
You might also want to experiment with the boost value for a large range like this and see if a bigger boost value is more helpful for your scenario.

Errors when grouping by list in Power Query

I have a set of unique items (Index) to each of which are associated various elements of another set of items (in this case, dates).
In real life, if a date is associated with an index, an item associated with that index appeared in a file generated on that date. For combination of dates that actually occurs, I want to know which accounts were present.
let
Source = Table.FromRecords({
[Idx = 0, Dates = {#date(2016,1,1), #date(2016,1,2), #date(2016,1,3)}],
[Idx = 1, Dates = {#date(2016,2,1), #date(2016,2,2), #date(2016,2,3)}],
[Idx = 2, Dates = {#date(2016,1,1), #date(2016,1,2), #date(2016,1,3)}]},
type table [Idx = number, Dates = {date}]),
// Group by
Grouped = Table.Group(Source, {"Dates"}, {{"Idx", each List.Combine({[Idx]}), type {number}}}),
// Clicking on the item in the top left corner generates this code:
Navigation = Grouped{[Dates={...}]}[Dates],
// Which returns this error: "Expression.Error: Value was not specified"
// My own code to reference the same value returns {0,2} as expected.
CorrectValue = Grouped{0}[Idx],
// If I re-make the table as below the above error does not occur.
ReMakeTable = Table.FromColumns(Table.ToColumns(Grouped), Table.ColumnNames(Grouped))
in ReMakeTable
It seems that I can use the results of this in my later work even without the Re-make (I just can't preview cells correctly), but I'd like to know if what's going on that causes the error and the odd code at the Navigation step, and why it disappears after the ReMakeTable step.
This happens because when you double click an item, the auto-generated code uses value filter instead of row index that you are using to get the single row from the table. And since you have a list as a value, it should be used instead of {...}. Probably UI isn't capable to work with lists in such a situation, and it inserts {...}, and this is indeed an incorrect value.
Thus, this line of code should look like:
Navigate = Grouped{[Dates = {#date(2016,1,1), #date(2016,1,2), #date(2016,1,3)}]}[Idx],
Then it will use value filter.
This is a bug in the UI. The index the UI calculates is incorrect: it should be 0 instead of [Dates={...}]. ... is a placeholder value, and it generates the "Value was not specified" exception if it is not replaced.

How can I more efficiently find the height of a table using Python

I am using openpyxl to copy data from an Excel spreadsheet. The data is a table for an inventory database, where each row is an entry in the database. I read the table one row at a time using a for loop. In order to determine the range of the for loop, I wrote a function that examines each cell in the table to find the height of the table.
Code:
def find_max(self, sheet, row, column):
max_row = 0
cell_top = sheet.cell(row = row - 1, column = column)
while cell_top.value != None:
cell = sheet.cell(row = row, column = column)
max = 0
while cell.value != None or sheet.cell(row = row + 1, column = column).value != None:
row += 1
max = max + 1
cell = sheet.cell(row = row, column = column)
if max > max_row:
max_row = max
cell_top = sheet.cell(row = row, column = column + 1)
return max_row
To summarize the function, I move to the next column in the worksheet and then iterate through every cell in that sheet, keeping track of its height until there are no more columns. The catch about this function is that it has to find two empty cells in a row in order to fail the condition. In a previous version I used a similar approach, but only used one column and stopped as soon as I found a blank cell. I had to change it so the program would still run if the user forgot to fill out a column. This function works okay for a small table, but on a table with several hundred entries this makes the program run much slower.
My question is this: What can I do to make this more efficient? I know nesting a while loop like that makes a program take longer but I do not see how to get around it. I have to make the program as foolproof as possible, so I need to check more than one column to stop user errors from failing the program
This is untested, but every time I've used openpyxl, I iterate over all rows like so:
for row in active_worksheet:
do_something_to(row)
so you could count like:
count = 0
for row in active_worksheet:
count += 1
EDIT: This is a better solution: Is it possible to get an Excel document's row count without loading the entire document into memory?
Read-only mode works row-by-row on the source so you probably want to hook it into it. Alternatively, you could pass the cells of the of a worksheet into something like a Pandas matrix which has indices for empty cells.

Efficient way to delete multiple rows in HBase

Is there an efficient way to delete multiple rows in HBase or does my use case smell like not suitable for HBase?
There is a table say 'chart', which contains items that are in charts. Row keys are in the following format:
chart|date_reversed|ranked_attribute_value_reversed|content_id
Sometimes I want to regenerate chart for a given date, so I want to delete all rows starting from 'chart|date_reversed_1' till 'chart|date_reversed_2'. Is there a better way than to issue a Delete for each row found by a Scan? All the rows to be deleted are going to be close to each other.
I need to delete the rows, because I don't want one item (one content_id) to have multiple entries which it will have if its ranked_attribute_value had been changed (its change is the reason why chart needs to be regenerated).
Being a HBase beginner, so perhaps I might be misusing rows for something that columns would be better -- if you have a design suggestions, cool! Or, maybe the charts are better generated in a file (e.g. no HBase for output)? I'm using MapReduce.
Firstly, coming to the point of range delete there is no range delete yet in HBase, AFAIK. But there is a way to delete more than one rows at a time in the HTableInterface API. For this simply form a Delete object with row keys from scan and put them in a List and use the API, done! To make scan faster do not include any column family in the scan result as all you need is the row key for deleting whole rows.
Secondly, about the design. First my understanding of the requirement is, there are contents with content id and each content has charts generated against them and those data are stored; there can be multiple charts per content via dates and depends on the rank. In addition we want the last generated content's chart to show at the top of the table.
For my assumption of the requirement I would suggest using three tables - auto_id, content_charts and generated_order. The row key for content_charts would be its content id and the row key for generated_order would be a long, which would auto-decremented using HTableInterface API. For decrementing use '-1' as the amount to offset and initialize the value Long.MAX_VALUE in the auto_id table at the first start up of the app or manually. So now if you want to delete the chart data simply clean the column family using delete and then put back the new data and then make put in the generated_order table. This way the latest insertion will also be at the top in the latest insertion table which will hold the content id as a cell value. If you want to ensure generated_order has only one entry per content save the generated_order id first and take the value and save it into content_charts when putting and before deleting the column family first delete the row from generated_order. This way you could lookup and charts for a content using 2 gets at max and no scan required for the charts.
I hope this is helpful.
You can use the BulkDeleteProtocol which uses a Scan that defines the relevant range (start row, end row, filters).
See here
I ran into your situation and this is my code to implement what you want
Scan scan = new Scan();
scan.addFamily("Family");
scan.setStartRow(structuredKeyMaker.key(starDate));
scan.setStopRow(structuredKeyMaker.key(endDate + 1));
try {
ResultScanner scanner = table.getScanner(scan);
Iterator<Entity> cdrIterator = new EntityIteratorWrapper(scanner.iterator(), EntityMapper.create(); // this is a simple iterator that maps rows to exact entity of mine, not so important !
List<Delete> deletes = new ArrayList<Delete>();
int bufferSize = 10000000; // this is needed so I don't run out of memory as I have a huge amount of data ! so this is a simple in memory buffer
int counter = 0;
while (entityIterator.hasNext()) {
if (counter < bufferSize) {
// key maker is used to extract key as byte[] from my entity
deletes.add(new Delete(KeyMaker.key(entityIterator.next())));
counter++;
} else {
table.delete(deletes);
deletes.clear();
counter = 0;
}
}
if (deletes.size() > 0) {
table.delete(deletes);
deletes.clear();
}
} catch (IOException e) {
e.printStackTrace();
}

Resources