What's more performing between an EXCEL formula, a PIVOT TABLE and VBA code? - performance

I have a (growing) table of data with 40.000 rows and 20 columns.
I need to group these data (by month and week) and to perform some simple operations (+ & /) between rows/columns.
I must be able to change the period in question and some specific rows to sum up. I know how to macro/pivot/formula, but I didn't started yet, and I would like the recalculation process to be the fastest possible, not that I click a button and then everything freezes for minutes.
Do you have any idea on what could be the most efficient solution?
Thank you

Excel have it's limits to store and analyze data at the same time.
If you're planning to build a growing database at MS Excel, at some point you will add so much data that the Excel files will not work. (or using them won't be time effective)
Before you get to that point you should be looking for alternate storage options as a scalable data solution.
They can be simple, like an Access DB, sqlite, PostgreSQL, Maria DB, or even PowerPivot (though this can have it's own issues).
Or more complex, like storing the data into a database, then adding an analysis cube and pulling smaller slices of data from these databases, into Excel for analysis and reporting.
Regardless of what you end up doing you will have to change how Excel interacts with the data.
You need to move all of the raw data to another system (Access or SQL are the easiest, but Excel supports a lot of other DB options) and pull smaller chunks of data back into Excel for time effective analysis.
Useful Links:
SQL Databases vs Excel
Using Access or Excel to manage your data

Related

How can I load large amount of data into oracle database from .csv -file without risking to drop och mismatch data?

I’m in the middle of trying to migrate a large amount of data into a oracle database from existing excel-files.
Due to the large amount of rows loaded (10 000 and more) every time, it is not possible to use SQL Developer for this tasks.
In every work-sheet there’s data that need to go into different tables, but at the same time keep the relations and not dropping any data.
As for now, I use one .CSV file for each table and mapping them together afterwards. This is thou combined with a great risk of adding the wrong FK and with that screw up the hole shit. And I don’t have the time, energy or will for clean ups even if it is my own mess…
My initial thought was if I could bulk transfer with sql loader using some kind of plsql-script in maybe an ctl-file (the used for mapping the properties) but it seems like I.m quite out in the bush with that one… (or am I…? )
The other thought was to create a simple program In c# and use fastMember and load the database that way. (But that means that I need to take the time to actually make the program, however small it is).
I can’t possible be the only one that have had this issue, but trying to us my notToElevatedNinjaGoogling-skills ends up with either using sql developer (witch is not an alternative) or the bulk copy thing from sql load (and where I need to map it all together afterwards).
Is there any alternative solutions for my problem or is the above solutions the one that I need to cope with?
Did you consider using CSV files as external tables? As they act as if they were ordinary Oracle tables, you can write (PL/)SQL against them, inserting data into different tables in the target schema. That might give you some more freedom & control over what you are doing.
Behind the scene, it is still SQL*Loader.

Data reload after query editor in Power BI

I have some huge csv files and it takes me quite a long time to load them in Power BI. I assume that it's normal when it's the first time that I load them. But, here is the problem. Every time I alter the data in Query Editor and then close & apply my changes, Power BI will reload the whole files and take once again a long time. Isn't it possible that Power BI only "reload" or "reread" the altered data ? (I know the "Enable load" & "Include in report refresh" features but it doesn't help)
I don't know if I made myself clear, if not, let me know what you don't understand.
The main problem is here related to the performance of Power BI which always reload the whole file(s) when you alter it.
Thanks a lot.
There's no Power BI solution for this - ref the many popular Ideas on their Community site for "incremental load" etc.
My typical workarounds are:
Pre-load CSV data to SQL Server or similar. PBI development will be much quicker (e.g. effective test filters) and you can possibly pre-aggregate in SQL.
Pre-process Queries using Excel Power Query, saving the results as Excel Tables. You can copy and paste the Query definitions between PBI and Excel.

"Saving" BigQuery Views for use in Tableau

I'm trying to make faster dashboards in Tableau by creating views of my calculations directly in BigQuery.
Based on my understating if the gcloud documentation here, the view will re-execute the query once it is accessed, so it kinda defeats my goal.*
*My goal is to eliminate calculations on the fly, be it in Tableau or BigQuery.
Is it possible to "save" these views, by way of scheduled scripts or workflows?
Thanks,
A view is best thought of as a way to reformat a table to make it look more convenient to further queries. The query still has to run on BigQuery so the benefits will be that the view may look simpler to Tableau than the raw table (particularly convenient if the view uses some complex SQL to create some of its columns). But it won't save calculation time.
But, if your view is doing some complex consolidation of a larger table then it might be worth saving the results as a new table instead of creating a view. This is OK if your underlying table doesn't change frequently (rule of thumb if you use the results every day and the table changes weekly, it is probably worthwhile and certainly so if the changes are monthly). Then Tableau will be querying pre-consolidated results rather than the much larger raw table. BigQuery storage and processing is cheap so this is often a reasonable solution.
Another alternative is to use a Tableau extract to bring the data into your local drive or server. This is only practical if the table is small enough to fit locally and will only work really well for speed if it fits into local memory (which can be a lot more than you might think). But extracts, at least on Tableau server, can be set to refresh on a schedule, making much faster user interaction and absolving you of having to remember to manually update the consolidated table.

Delphi: ClientDataSet is not working with big tables in Oracle

We have a TDBGrid that connected to TClientDataSet via TDataSetProvider in Delphi 7 with Oracle database.
It goes fine to show content of small tables, but the program hangs when you try to open a table with many rows (for ex 2 million rows) because TClientDataSet tries to load the whole table in memory.
I tried to set "FetchOnDemand" to True for our TClientDataSet and "poFetchDetailsOnDemand" to True in Options for TDataSetProvider, but it does not help to solve the problem. Any ides?
Update:
My solution is:
TClientDataSet.FetchOnDemand = T
TDataSetProvider.Options.poFetchDetailsOnDemand = T
TClientDataSet.PacketRecords = 500
I succeeded to solve the problem by setting the "PacketRecords" property for TCustomClientDataSet. This property indicates the number or type of records in a single data packet. PacketRecords is automatically set to -1, meaning that a single packet should contain all records in the dataset, but I changed it to 500 rows.
When working with RDBMS, and especially with large datasets, trying to access a whole table is exactly what you shouldn't do. That's a typical newbie mistake, or a borrowing from old file based small database engines.
When working with RDBMS, you should load the rows you're interested in only, display/modify/update/insert, and send back changes to the database. That means a SELECT with a proper WHERE clause and also an ORDER BY - remember row ordering is never assured when you issue a SELECT without an OREDER BY, a database engine is free to retrieve rows in the order it sees fit for a given query.
If you have to perform bulk changes, you need to do them in SQL and have them processed on the server, not load a whole table client side, modify it, and send changes row by row to the database.
Loading large datasets client side may fali for several reasons, lack of memory (especially 32 bit applications), memory fragmentation, etc. etc., you will flood the network probably with data you don't need, force the database to perform a full scan, maybe flloding the database cache as well, and so on.
Thereby client datasets are not designed to handle millions of billions of rows. They are designed to cache the rows you need client side, and then apply changes to the remote data. You need to change your application logic.

How to optimize data fetching in SQL Developer?

I am working on Oracle SQL Developer and have created tables with Wikipedia data, so size of data is very huge and have 7 tables. I have created a search engine which fetches and display data using JSP, but the problem is that for each query the application has to access 4 tables making my application very time consuming.
I have added indexes to all tables but still it takes more time, so any suggestion on how to optimize my app and reduce time it is taking to display result.
There are several approaches you can take to tune your application. And it could be either tuning at the database end, front end or a combination of the two.
At the database end you could be looking at say a materialized view to summarize the more commonly searched data. This could either be for your search purposes only or to reduce the size and complexity of the resultset. You might also look at tuning the query itself - perhaps placing indexes on the relevant WHERE clauses of your search or look at denormalizing your tables.
At the application end - the retrieval of vast recordsets - can always cause problems where a single record is large (multi-columned) and the number or records in the resultset - numerous.
What you are probably looking for is a rapid response time from your application so your user doesn't feel they are waiting ... and waiting.
A technique I have seen and used is to retrieve the resultset either as
1) a recordset of ROWIDs and to page through these ROWIDs on the display
2) a simulated "paged" recordset. Retrieving the recordset in chunks.

Resources