Maybe someone can shed any light, personal experience or reference to official documentation.
Suppose, I have a Google Spreadsheet, which I connected to other Spreadsheets by using IMPORTRANGE. I noticed that my receiving Spreadsheet started loading slower than normal. Are there any tricks for optimizing the loading speed? For example, does it make any difference if I:
Use IMPORTRANGE less frequently by loading the data (let's say, once) to a separate tab, and then query that tab internally from within the same spreadsheet?
or
Use IMPORTRANGE frequently in multiple cells and run Query for each cell individually, and avoid having a large dedicated tab that gets all the info first?
Use IMPORTRANGE less frequently by loading the data (let's say, once) to a separate tab, and then query that tab internally from within the same spreadsheet?
definitely the right approach to gain performance speed
Related
We're using PrimeNG's Turbo Table with dynamic columns to display 1000 records per page.Both server-side pagination and sorting are enabled. This works well in Chrome but it is staggeringly slow in IE during sorting, pagination or updating all the records. Once the server response is returned in ~9s, IE freezes for 2 min and then displays the data. Also, the table uses ngSwitch to determine the column content, for instance, some columns display icons, some display a text area and so on.
Enabling prodMode has helped improve the page load significantly and this is comparable to Chrome now, however we still have performance issues while sorting, pagination and updating the records.
We've tested the performance by removing ngSwitch and observed a slight improvement - a 10s reduction. However, we require the ngSwitch functionality, so removing it really isn't an option, unless there's an alternative we can use. Anyhow, it doesn't quite solve our problem.
Appreciate any help please!
It can be possible that IE needs more time than any other latest browsers, especially in the situation that you need to display 1000 records which is not a small amount. I don't know if it is possible for your-case to load less records per time. If possible, you could find a balance between the load time and records amounts in IE.
Without any sample code, we can only provide some general suggestions. In this situation, I can only suggest you to use F12 developer Performance tools to analyze UI performance, and use F12 developer Network tools to check the request spend time. If the Dev Tools doesn't work well in your IE, you could also use Windows Performance Toolkit to analyze the website performance.
I have one thousand Google Form Responses spreadsheets. These are students answer sheets. I built a spreadsheet and pull data (TimeStamps and scores) for each student by using Google Spreadsheet formulas (INDEX MATCH and IMPORTDATA). Each student has different pages. But, it takes too many times and sometimes causes some source sheets being unresponsive (I think because of heavy formula usage). My questions;
Is it possible to do the same thing (pulling data if matches student's name from one thousand spreadsheets) by using Google Script?
If possible, which ones (Google Spreadsheets with formulas or Google Script) performance is better?
By looking your answers I will decide to begin learning Google Script or not.
Thanks in advance.
Is it possible to do the same thing (pulling data if matches student's name from one thousand spreadsheets) by using Google Script?
Yes, it's possible.
NOTE: Bear in mind that Google Sheets has a 5 million cell limit, so if your data exceeds this limit, you should consider to use another data repository.
If possible, which ones (Google Spreadsheets with formulas or Google Script) performance is better?
Since most Google Sheets formulas are recalculated every time that a change is made in the spreadsheet that holds them, it's very likely that Google Apps Script will be better when using Google Sheets/Google Apps Script as database management system because we could have more control over when the database transactions will be made.
Related
Measurement of execution time of built-in functions for Spreadsheet
Why do we use SpreadsheetApp.flush();?
Both does the same thing. Both will be as intensive on your computer. My advice would be to upgrade your PC!
I am working in BluePrism Robotics Process Automation and trying to load an excel sheet with more than 100k records (It might go upwards of 300k in some cases).
I am trying to load internal work queue of BluePrism, but I get an error as quoted below:
'Load Data Into Queue' ERROR: Internal : Exception of type 'System.OutOfMemoryException' was thrown.
Is there a way to avoid this problem, in the way where I can free up more memory?
I plan to process records one by one from queue, and put them into new excel sheets categorically. Loading all that data in a collection and looping over it may be memory consuming, so I am trying to find out a more efficient way.
I welcome any and all help/tips.
Thanks!
Basic Solution:
Break up the number of Excel rows you are pulling into your Collection data item at any one time. The thresholds for this will depend on your resource system memory and architecture, as well as structure and size of the data in the Excel Worksheet. I've been able to quickly move 50k 10-column-rows from Excel to a Collection and then into the Blue Prism queue very quickly.
You can set this up by specifying the Excel Worksheet range to pull into the Collection data item, and then shift that range each time the Collection has been successfully added to the queue.
After each successful addition to the queue and/or before you shift the range and/or at a predefined count limit you can then run a Clean Up or Garbage Collection action to free up memory.
You can do all of this with the provided Excel VBO and an additional Clean Up object.
Keep in mind:
Even breaking it up, looping over a Collection this large to amend the data will be extremely expensive and slow. The most efficient way to make changes to the data will be at the Excel Workbook level or when it is already in the Blue Prism queue.
Best Bet: esqew's alternative solution is the most elegant and probably your best bet.
Jarrick hit it on the nose in that Work Queue items should provide the bot with information on what they are to be working on and a Control Room feedback space, but not the actual work data to be implemented/manipulated.
In this case you would want to just use the items Worksheet row number and/or some unique identifier from a single Worksheet column as the queue item data so that the bot can provide Control Room feedback on the status of the item. If this information is predictable enough in format there should be no need to move any data from the Excel Worksheet to a Collection and then into a Work Queue, but rather simply build the queue based on that data predictability.
Conversely you can also have the bot build the queue "as it happens", in that once it grabs the single row data from the Excel Worksheet to work it, can as well add a queue item with the row number of the data. This will then enable Control Room feedback and tracking. However, this would, in almost every case, be a bad practice as it would not prevent a row from being worked multiple times unless the bot checked the queue first, at which point you've negated the speed gains you were looking to achieve in cutting out the initial queue building in the first place. It would also be impossible to scale the process for multiple bots to work the Excel Worksheet data efficiently.
This is a common issue for RPA, especially if working with large excel files. As far as I know, there are no 100% solutions, but only methods reduce the symptoms. I have run into this problem several times and these are the ways I would try to handle them:
Disable or Errors only for stage logging.
Don`t log parameters on action stages (especially ones that work with the excel files)
Run Garbage collection process
See if it is possible to avoid reading excel files into BP collections and use OLEDB to query the file
See if it is possible to increase the Ram memory on the machines
If they’re using the 32-bit version of the app, then it doesn’t really matter how much memory you feed it, Blue Prism will cap out at 2 GB.
This is may be because of BP Server as the memory is shared between Processes and Work queue.Better option is to use two bots and multiple queues to avoid Memory Error.
If you're using Excel documents or CSV files, you can use the OLEDB object to connect and query against it as if it were a database. You can use the SQL syntax to limit the amount of rows that are returned at a time and paginate through them until you've reached the end of the document.
For starters, you are making incorrect use of the Work Queue in Blue Prism. The Work Queue should not be used to store this type and amount of data. (please read the BP documentation on Work Queues thoroughly).
Solving the issue at hand, being the misuse requires 2 changes:
Only store references in your Item Data which point to the Excel file containing the data.
If you're consulting this much data many times, perhaps convert the file into a CSV, write a VBO that queries the data directly in the CSV.
The first change is not just a recommendation, but as your project progresses and IT Architecture and InfoSec comes into play, it will be mandatory.
As for the CSV VBO, take a look at C#, it will make your life a lot easier than loading all this data into BP (time consuming, unreliable, ...).
In the past I used to build WebAnalytics using OLAP cubes running on MySQL.
Now an OLAP cube the way I used it is simply a large table (ok, it was stored a bit smarter than that) where each row is basically a measurement or and aggregated set of measurements. Each measurement has a bunch of dimensions (i.e. which pagename, useragent, ip, etc.) and a bunch of values (i.e. how many pageviews, how many visitors, etc.).
The queries that you run on a table like this are usually of the form (meta-SQL):
SELECT SUM(hits), SUM(bytes),
FROM MyCube
WHERE date='20090914' and pagename='Homepage' and browser!='googlebot'
GROUP BY hour
So you get the totals for each hour of the selected day with the mentioned filters.
One snag was that these cubes usually meant a full table scan (various reasons) and this meant a practical limitation on the size (in MiB) you could make these things.
I'm currently learning the ins and outs of Hadoop and the likes.
Running the above query as a mapreduce on a BigTable looks easy enough:
Simply make 'hour' the key, filter in the map and reduce by summing the values.
Can you run a query like I showed above (or at least with the same output) on a BigTable kind of system in 'real time' (i.e. via a user interface and the user get's their answer ASAP) instead of batch mode?
If not; what is the appropriate technology to do something like this in the realm of BigTable/Hadoop/HBase/Hive and the likes?
It's even kind of been done (kind of).
LastFm's aggregation/summary engine: http://github.com/zohmg/zohmg
A google search turned up a google code project "mroll" but it doesn't have anything except contact info (no code, nothing). Still, might want to reach out to that guy and see what's up. http://code.google.com/p/mroll/
We managed to create low latency OLAP in HBase by preagragating a SQL query and mapping it into appropriate Hbase qualifiers. For more detail visit below site.
http://soumyajitswain.blogspot.in/2012/10/hbase-low-latency-olap.html
My answer relates to HBase, but applies equally to BigTable.
Urban Airship open-sourced datacube, which I think is close to what you want. See their presentation here.
Adobe also has a couple of presentations (here and here) on how they do "low-latency OLAP" with HBase.
Andrei Dragomir made an interesting talk about how Adobe performs OLAP functionality with M/R and HBase.
Video: http://www.youtube.com/watch?v=5U3EnfiKs44
Slides: http://hstack.org/hbasecon-low-latency-olap-with-hbase/
If you are looking for a table-scan approach, have you considered Google BigQuery? BigQuery does automatic scale-out on the back-side that gives interactive response. There is a good session by Jordan Tigani from the 2012 Google I/O event that explains some of the internals.
http://www.youtube.com/watch?v=QI8623HlYd4
It's not MapReduce but it is geared towards high-speed table scan like what you described.
Large datasets, millions of records, need special programming to maintain speed in DBGrids.
I want to know if there are any ready-made components for Delphi (DBGrids) that do this automatically?
EDIT For Example: Some databases have features such as fetch 1st X records (eg 100 records). When I reach the bottom with scrolling, I want to auto fetch the next 100. Conversely when I reach the beginning, I want to fetch the previous 100. I know I can program this, but it sure is possible to propagate that feature to a DBGrid control where the DBGrid does the buffering. It will save quite a bit of programming - you simply have to set the "buffer size" so to speak.
You might want to take a look at the wonderful (free, open source, dual licensed as MPL 1.1 and GPL thus usable in closed source apps) Virtual TreeView and its user-supplied descendants (scroll down the page to find those.)
Edit to reflect the question's edit: Virtual TreeView not only allows you to handle millions of nodes without keeping them in memory, but that is in fact the preferred way of using it. You supply the data through event callbacks when it's needed, and you can tell the tree to cache that data (or not.)
Oh, and of course it also has a grid / report mode where it can function as a table (just set the GridExtensions property to True.)
I would have a look at Developer Express QuantumGrid Suite. (#birger: you just were a tick faster ;-) ) So I'm not just duplicating the answer, some elaboration:
The DevExpress Grid uses a data controller that has several modes to controll the data bound to the grid. One of these is exactly what you are looking for:
Grid Mode
When using Grid Mode, only a fixed
number of dataset records is loaded
into memory. Because only a limited
set of records are retrieved from the
dataset, automatic sorting, filtering
and summary calculations are disabled
in Grid Mode (must be controlled
manually instead). By default, this
mode is disabled and the
ExpressDataController loads all
records in a dataset.
It does have some drawbacks, which seem pretty obvious: you cannot make a summary, sort, or filter if you do not have all records at hand.
NextGrid is light, fast and nice looking grid for Delphi
http://www.bergsoft.net/component/next-grid/features.htm
HANDLING LARGE AMOUNT OF CELLS WITHOUT LOOSING SPEED
NextGrid can handle very large amount
of cells without losing speed. Speed
of adding, modifying and deleting data
doesn't depend of the amount of cells.
In NextGrid demo you can see how fast
NextGrid work with 100,000 rows and 10
columns = 1,000,000 cells
I think the DevExpress Quantumgrid supports this very good.
sorry, I just saw your comment to Neftalí
if you would to bring 100 record per time, and then fetch the next 100, this work related to database access components, look at devart components, they are offer direct access components to most used database, and they have the feature you are asking about and more:
http://www.devart.com/products-vcl.html