Is there a way to improve speed/performance of google sheets functions? - performance

I recently worked on a project in google sheets. While everything is working, changes in the page take a while to process and load. There are a few parts of the project (described below). Is there a way to find out what is causing the biggest load on the project, so that I can work on that area?
Alternatively, if someone has experience with the following types of functions, what do you think is causing the biggest slowdown:
I have a query to lookup and match values. Would this be faster as a vlookup with sort in an arrayformula?
=IFERROR(QUERY(Record!A:C,"Select A where B = '"&B7&"' order by A desc limit 1 label A ''"),"")
I have random number generation through an arrayformula:
=ARRAYFORMULA(IF(ROW(B6:B)=6,"RANDOM",IF(ISBLANK(B6:B),"",RANDBETWEEN(0+0*ROW(B6:B),COUNTA(B6:B)))))
This fills in a cell with a random number if the one next to it has a value. I use this for random sampling in another query later.
I have some conditional formatting based on whether the cell has something in it.
I have some data validation based on a range of cells.
Note: Since my main question is about performance, I didn't think having an example file would be beneficial. It would take me a little to make one so if Ii should, let me know. Also, since other questions deal with scripting performance (like this one and this one) I feel like my question is different.

I suspect the RANDBETWEEN formula is your biggest culprit - basically every time the spreadsheet changes in any way whatsoever, even if you dont actually edit, the numbers all change, so inside an arrayformula, depending on how many rows you have, its always recalculating the rand for every single row

Related

LabVIEW - How to accumulate data in array?

I made a program aimed to simulate the intensity of light when many light bulbs are put together. I have intensity data of one bulb in xls.-files. So, I want to program to work as follows.
Open the xls.-file and get the data.
Put the data into different positions. I put one data set (one bulb) in each excel sheet. This is to simulate putting the bulb in different places.
Sum the data in the same cell across the different sheets.
My LabVIEW front panel and block diagram are:
My problem is this program runs too slowly. How should I improve this? I have an idea to make a big array and accumulate data in that array. However, I do not know how to do it. The Insert Into Array and Replace Array Subset functions are not suitable for my purposes.
The most probable reason of slow performance is that you do a lot of operations on Excel file. You should rather read data into memory and operate on them in VI. At the end, if you need, you can update the Excel file with final results.
It would be difficult to tell you exactly how to do it. As you said, you're beginner and I think that the best way would be to simple do some LabVIEW exercises and gain more experience to understand how to work with arrays :) I recommend to take a look at examples (Help->Find Examples), read some user guides from ni.com or find other "getting started" materials on the Internet.
Check these, you may find them useful:
https://zone.ni.com/reference/en-XX/help/371361R-01/lvhowto/lv_getting_started/
https://www.ni.com/getting-started/labview-basics/data-structures
https://www.ni.com/pl-pl/support/documentation/supplemental/08/labview-arrays-and-clusters-explained.html

Google Sheets: Why does Iterative Calculation create duplicates, make formulas act weird?

I'm trying to make a NPC character sheet genetator for my Dungeons and Dragons setting in Google Sheets. Right now, I want to have Google Sheets distribute random numbers, but the highest of the numbers should go to the more important abilities for the selected class. And then I want it to distribute more points in a probably quite inefficient way, but that's not my main problem right now xD
It all works fine until I turn on Iterative Calculation, so some values can change based on each other in the table that shows which are the more important abilities for the selected class. Now there's random duplicates of values and sometimes Google Sheets fails to simply copy a range correctly. I'm not sure how much sense I'm making here, and I'm very much a beginner at this. Please, someone smart explain me what the problem is and how I can work my way around it! Thank you!
Link to the example:
https://docs.google.com/spreadsheets/d/12zrEADm4p7teHsl4WPzeuRe5yo0wCS54gt_9_b_Xh60/edit?usp=sharing
I highlighted the ranges that started acting weird in light red. The cells with formulas that depend on each other I highlighted with blue.
turn off the iterative calculation and use:
=ARRAYFORMULA(CHAR(64+SORT({1; 2}, 1, RANDBETWEEN(0, 1))))
update:
=INDEX({{"B";"A"},{"A";"A"},{"A";"B"}},, RANDBETWEEN(1, 3))

How to implement a spreadsheet in a browser?

I was recently asked this in an interview (Software Engineer) and didn't really know how to go about answering the question.
The question was focused on both the algorithm of the spreadsheet and how it would interact with the browser. I was a bit confused on what data structure would be optimal to handle the cells and their values. I guess any form of hash table would work with cells being the unique key and the value being the object in the cell? And then when something gets updated, you'd just update that entry in your table. The interviewer hinted at a graph but I was unsure of how a graph would be useful for a spreadsheet.
Other things I considered were:
Spreadsheet in a browser = auto-save. At any update, send all the data back to the server
Cells that are related to each other, i.e. C1 = C2+C3, C5 = C1-C4. If the value of C2 changes, both C1 and C5 change.
Usage of design patterns? Does one stand out over another for this particular situation?
Any tips on how to tackle this problem? Aside from the algorithm of the spreadsheet itself, what else could the interviewer have wanted? Does the fact that its in a browser as compared to a separate application add any difficulties?
Thanks!
For an interview this is a good question. If this was asked as an actual task in your job, then there would be a simple answer of use a third party component, there are a few good commercial ones.
While we can't say for sure what your interviewer wanted, for me this is a good question precisely because it is so open ended and has so many correct possible answers.
You can talk about the UI and how to implement the kind of dynamic grid you need for a spreadsheet and all the functionality of the cells and rows and columns and selection of cells and ranges and editing of values and formulas. You probably could talk for a while on the UI implications alone.
Alternatively you can go the data route, talk about data structures to hold a spreadsheet, talk exactly about links between cells for formulas, talk about how to detect and deal with circular references, talk about how in a browser you have less control over memory and for very large spreadsheets you could run into problems earlier. You can talk about what is available in JavaScript vs a native language and how this impacts the data structures and calculations. Also along with data, a big important issue with spreadsheets is numerical accuracy and floating point number calculations. Floating point numbers are made to be fast but are not necessarily accurate in extreme levels of precision and this leads to a lot of confusing questions. I believe very recently Excel switched to their own representation of a fixed decimal number as it's now viable to due spreadsheet level calculations without using the built-in floating point calculations. You can also talk about data structures and calculation and how they affect performance. In a browser you don't have threads (yet) so you can't run all the calculations in the background. If you have 100,000 rows with complex calculations and change one value that cascades across everything, you can get a warning about a slow script. You need to break up the calculation.
Finally you can run form the user experience angle. How is the experience in a browser different from a native application? What are the advantages and what cool things can you do in a browser that may be difficult in a desktop application? What things are far more complicated or even totally impossible (example, associate your spreadsheet app with a file type so a user can double-click a file and open it in your online spreadsheet app, although I may be wrong about that still being unsupported).
Good question, lots of right answers, very open ended.
On the other hand, you could also have had a bad interviewer that is specifically looking for the answer they want and in that case you're pretty much out of luck unless you're telepathic.
You can say hopelessly too much about this. I'd probably start with:
If most of the cells are filled, use a simply 2D array to store it.
Otherwise use a hash table of location to cell
Or perhaps something like a kd-tree, which should allow for more efficient "get everything in the displayed area" queries.
By graph, your interviewer probably meant have each cell be a vertex and each reference to another cell be a directed edge. This would allow you to do checks for circular references fairly easily, and allow for efficiently updating of all cells that need to change.
"In a browser" (presumably meaning "over a network" - actually "in a browser" doesn't mean all that much by itself - one can write a program that runs in a browser but only runs locally) is significant - you probably need to consider:
What are you storing locally (everything or just the subset of cells that are current visible)
How are you sending updates to the server (are you sending every change or keeping a collection of changed cells and only sending updates on save, or are you not storing changes separately and just sending the whole grid across during save)
Auto-save should probably be considered as well
Will you have an "undo", will this only be local, if not, how will you handle this on the server and how will you send through the updates
Is only this one user allowed to work with it at a time (or do you have to cater for multi-user, which brings dealing with conflicts, among other things, to the table)
Looking at the CSS cursor property just begs for one to create
a spreadsheet web application.
HTML table or CSS grid? HTML tables are purpose built for tabular
data.
Resizing cell height and width is achievable with offsetX and
offsetY.
Storing the data is trivial. It can be Mongo, mySQL, Firebase,
...whatever. On blur, send update.
Javascrip/ECMA is more than capable of delivering all the Excel built-in
functions. Did I mention web workers?
Need to increment letters as in column ID's? I got you covered.
Most importantly, don't do it. Why? Because it's already been done.
Find a need and work that project.

My Algorithm only fails for large values - How do I debug this?

I'm working on transcribing as3delaunay to Objective-C. For the most part, the entire algorithm works and creates graphs exactly as they should be. However, for large values (thousands of points), the algorithm mostly works, but creates some incorrect graphs.
I've been going back through and checking the most obvious places for error, and I haven't been able to actually find anything. For smaller values I ran the output of the original algorithm and placed it into JSON files. I then read that output in to my own tests (tests with 3 or 4 points only), and debugged until the output matched; I checked the output of the two algorithms line for line, and found the discrepancies. But I can't feasibly do that for 1000 points.
Answers don't need to be specific to my situation (although suggesting tools I can use would be excellent).
How can I debug algorithms that only fail for large values?
If you are transcribing an existing algorithm to Objective-C, do you have a working original in some other language? In that case, I would be inclined to put in print statements in both versions and debug the first discrepancy (the first, because later discrepancies could be knock-on errors).
I think it is very likely that the program also makes mistakes for smaller graphs, but more rarely. My first step would in fact be to use the working original (or some other means) to run a large number of automatically checked test runs on small graphs, hoping to find the bug on some more manageable input size.
Find the threshold
If it works for 3 or 4 items, but not for 1000, then there's probably some threshold in between. Use a binary search to find that threshold.
The threshold itself may be a clue. For example, maybe it corresponds to a magic value in the algorithm or to some other value you wouldn't expect to be correlated. For example, perhaps it's a problem when the number of items exceeds the number of pixels in the x direction of the chart you're trying to draw. The clue might be enough to help you solve the problem. If not, it may give you a clue as to how to force the problem to happen with a smaller value (e.g., debug it with a very narrow chart area).
The threshold may be smaller than you think, and may be directly debuggable.
If the threshold is a big value, like 1000. Perhaps you can set a conditional breakpoint to skip right to iteration 999, and then single-step from there.
There may not be a definite threshold, which suggests that it's not the magnitude of the input size, but some other property you should be looking at (e.g., powers of 10 don't work, but everything else does).
Decompose the problem and write unit tests
This can be tedious but is often extremely valuable--not just for the current issue, but for the future. Convince yourself that each individual piece works in isolation.
Re-visit recent changes
If it used to work and now it doesn't, look at the most recent changes first. Source control tools are very useful in helping you remember what has changed recently.
Remove code and add it back piece by piece
Comment out as much code as you can and still get some kind of reasonable output (even if that output doesn't meet all the requirements). For example, instead of using a complicated rounding function, just truncate values. Comment out code that adds decorative touches. Put assert(false) in any special case handlers you don't think should be activated for the test data.
Now verify that output, and slowly add back the functionality you removed, one baby step at a time. Test thoroughly at each step.
Profile the code
Profiling is usually for optimization, but it can sometimes give you insight into code, especially when the data size is too large for single-stepping through the debugger. I like to use line or statement counts. Is the loop body executing the number of times you expect? Or twice as often? Or not at all? How about the then and else clauses of those if statements? Logic bugs often become very obvious with this type of profiling.

Efficient way to represent locations, and query based on proximity?

I'm pondering over how to efficiently represent locations in a database, such that given an arbitrary new location, I can efficiently query the database for candidate locations that are within an acceptable proximity threshold to the subject.
Similar things have been asked before, but I haven't found a discussion based on my criteria for the problem domain.
Things to bear in mind:
Starting from scratch, I can represent data in any way (eg. long&lat, etc)
Any result set is time-sensitive, in that it loses validity within a short window of time (~5-15mins) so I can't cache indefinitely
I can tolerate some reasonable margin of error in results, for example if a location is slightly outside of the threshold, or if a row in the result set has very recently expired
A language agnostic discussion is perfect, but in case it helps I'm using C# MVC 3 and SQL Server 2012
A couple of first thoughts:
Use an external API like Google, however this will generate thousands of requests and the latency will be poor
Use the Haversine function, however this looks expensive and so should be performed on a minimal number of candidates (possibly as a Stored Procedure even!)
Build a graph of postcodes/zipcodes, such that from any node I can find postcodes/zipcodes that border it, however this could involve a lot of data to store
Some optimization ideas to reduce possible candidates quickly:
Cache result sets for searches, and when we do subsequent searches, see if the subject is within an acceptable range to a candidate we already have a cached result set for. If so, use the cached result set (but remember, the results expire quickly)
I'm hoping the answer isn't just raw CPU power, and that there are some approaches I haven't thought of that could help me out?
Thank you
ps. Apologies if I've missed previously asked questions with helpful answers, please let me know below.
What about using GeoHash? (refer to http://en.wikipedia.org/wiki/Geohash)

Resources