I have some huge csv files and it takes me quite a long time to load them in Power BI. I assume that it's normal when it's the first time that I load them. But, here is the problem. Every time I alter the data in Query Editor and then close & apply my changes, Power BI will reload the whole files and take once again a long time. Isn't it possible that Power BI only "reload" or "reread" the altered data ? (I know the "Enable load" & "Include in report refresh" features but it doesn't help)
I don't know if I made myself clear, if not, let me know what you don't understand.
The main problem is here related to the performance of Power BI which always reload the whole file(s) when you alter it.
Thanks a lot.
There's no Power BI solution for this - ref the many popular Ideas on their Community site for "incremental load" etc.
My typical workarounds are:
Pre-load CSV data to SQL Server or similar. PBI development will be much quicker (e.g. effective test filters) and you can possibly pre-aggregate in SQL.
Pre-process Queries using Excel Power Query, saving the results as Excel Tables. You can copy and paste the Query definitions between PBI and Excel.
Related
I’m in the middle of trying to migrate a large amount of data into a oracle database from existing excel-files.
Due to the large amount of rows loaded (10 000 and more) every time, it is not possible to use SQL Developer for this tasks.
In every work-sheet there’s data that need to go into different tables, but at the same time keep the relations and not dropping any data.
As for now, I use one .CSV file for each table and mapping them together afterwards. This is thou combined with a great risk of adding the wrong FK and with that screw up the hole shit. And I don’t have the time, energy or will for clean ups even if it is my own mess…
My initial thought was if I could bulk transfer with sql loader using some kind of plsql-script in maybe an ctl-file (the used for mapping the properties) but it seems like I.m quite out in the bush with that one… (or am I…? )
The other thought was to create a simple program In c# and use fastMember and load the database that way. (But that means that I need to take the time to actually make the program, however small it is).
I can’t possible be the only one that have had this issue, but trying to us my notToElevatedNinjaGoogling-skills ends up with either using sql developer (witch is not an alternative) or the bulk copy thing from sql load (and where I need to map it all together afterwards).
Is there any alternative solutions for my problem or is the above solutions the one that I need to cope with?
Did you consider using CSV files as external tables? As they act as if they were ordinary Oracle tables, you can write (PL/)SQL against them, inserting data into different tables in the target schema. That might give you some more freedom & control over what you are doing.
Behind the scene, it is still SQL*Loader.
Has anyone else had issues (and solved them!) with very slow queries in Power BI when using the "Edit Query" (or Power M Query) side of PowerBI?
I am using multiple nested (using the Reference option - not Duplicate) queries (to get different aggregation levels). All based on a single table read from a large csv file.
I expected the data to be read once to the base table and then each derived table would extract data from the locally stored base table. But it seems to go back to the source data multiple times. This takes the run time to over 15 minutes.
Are there options to stop Power BI from going back to the source for each of these?
Answer: it turned out that the folder where the input csv was being held was extraordinarily slow. When we moved it to test alternatives, the speed problem went away.
I am not sure that you can avoid multiple rereads of an input Excel file in Power BI
I have a (growing) table of data with 40.000 rows and 20 columns.
I need to group these data (by month and week) and to perform some simple operations (+ & /) between rows/columns.
I must be able to change the period in question and some specific rows to sum up. I know how to macro/pivot/formula, but I didn't started yet, and I would like the recalculation process to be the fastest possible, not that I click a button and then everything freezes for minutes.
Do you have any idea on what could be the most efficient solution?
Thank you
Excel have it's limits to store and analyze data at the same time.
If you're planning to build a growing database at MS Excel, at some point you will add so much data that the Excel files will not work. (or using them won't be time effective)
Before you get to that point you should be looking for alternate storage options as a scalable data solution.
They can be simple, like an Access DB, sqlite, PostgreSQL, Maria DB, or even PowerPivot (though this can have it's own issues).
Or more complex, like storing the data into a database, then adding an analysis cube and pulling smaller slices of data from these databases, into Excel for analysis and reporting.
Regardless of what you end up doing you will have to change how Excel interacts with the data.
You need to move all of the raw data to another system (Access or SQL are the easiest, but Excel supports a lot of other DB options) and pull smaller chunks of data back into Excel for time effective analysis.
Useful Links:
SQL Databases vs Excel
Using Access or Excel to manage your data
I have been asked to create a simple program to submit user defined queries to SQLite databases (.db). I have not worked with the offline databases before and have a question about optimizing performance.
There are a few hundred .db files that I need to query. Is it quicker to attach them all to a single query using ATTACH, or join them all into a single database and work from there? My thoughts are that there will be some trade off between how much time it takes for inital set up versus the query speed. Is there perhaps a different method that would result in better performance?
I dont think it will matter, but this will be written with C# for a windows OS desktop.
Thanks!
The documentation says:
The number of simultaneously attached databases is limited to SQLITE_MAX_ATTACHED which is set to 10 by default. [...] The number of attached databases cannot be increased above 62.
So attaching a few hundred databases will be very quick because outputting an error message can be done really fast. ☺
Looking for a bit of advice on how to optimise one of our projects. We have a ASP.NET/C# system that retrieves data from a SQL2008 data and presents it on a DevExpress ASPxGridView. The data that's retrieved can come from one of a number of databases - all of which are slightly different and are being added and removed regularly. The user is presented with a list of live "companies", and the data is retrieved from the corresponding database.
At the moment, data is being retrieved using a standard SqlDataSource and a dynamically-created SQL SELECT statement. There are a few JOINs in the statement, as well as optional WHERE constraints, again dynamically-created depending on the database and the user's permission level.
All of this works great (honest!), apart from performance. When it comes to some databases, there are several hundreds of thousands of rows, and retrieving and paging through the data is quite slow (the databases are already properly indexed). I've therefore been looking at ways of speeding the system up, and it seems to boil down to two choices: XPO or LINQ.
LINQ seems to be the popular choice, but I'm not sure how easy it will be to implement with a system that is so dynamic in nature - would I need to create "definitions" for each database that LINQ could access? I'm also a bit unsure about creating the LINQ queries dynamically too, although looking at a few examples that part at least seems doable.
XPO, on the other hand, seems to allow me to create a XPO Data Source on the fly. However, I can't find too much information on how to JOIN to other tables.
Can anyone offer any advice on which method - if any - is the best to try and retro-fit into this project? Or is the dynamic SQL model currently used fundamentally different from LINQ and XPO and best left alone?
Before you go and change the whole way that your app talks to the database, have you had a look at the following:
Run your code through a performance profiler (such as Redgate's performance profiler), the results are often surprising.
If you are constructing the SQL string on the fly, are you using .Net best practices such as String.Concat("str1", "str2") instead of "str1" + "str2". Remember, multiple small gains add up to big gains.
Have you thought about having a summary table or database that is periodically updated (say every 15 mins, you might need to run a service to update this data automatically.) so that you are only hitting one database. New connections to databases are quiet expensive.
Have you looked at the query plans for the SQL that you are running. Today, I moved a dynamically created SQL string to a sproc (only 1 param changed) and shaved 5-10 seconds off the running time (it was being called 100-10000 times depending on some conditions).
Just a warning if you do use LINQ. I have seen some developers who have decided to use LINQ write more inefficient code because they did not know what they are doing (pulling 36,000 records when they needed to check for 1 for example). This things are very easily overlooked.
Just something to get you started on and hopefully there is something there that you haven't thought of.
Cheers,
Stu
As far as I understand you are talking about so called server mode when all data manipulations are done on the DB server instead of them to the web server and processing them there. In this mode grid works very fast with data sources that can contain hundreds thousands records. If you want to use this mode, you should either create the corresponding LINQ classes or XPO classes. If you decide to use LINQ based server mode, the LINQServerModeDataSource provides the Selecting event which can be used to set a custom IQueryable and KeyExpression. I would suggest that you use LINQ in your application. I hope, this information will be helpful to you.
I guess there are two points where performance might be tweaked in this case. I'll assume that you're accessing the database directly rather than through some kind of secondary layer.
First, you don't say how you're displaying the data itself. If you're loading thousands of records into a grid, that will take time no matter how fast everything else is. Obviously the trick here is to show a subset of the data and allow the user to page, etc. If you're not doing this then that might be a good place to start.
Second, you say that the tables are properly indexed. If this is the case, and assuming that you're not loading 1,000 records into the page at once and retreiving only subsets at a time, then you should be OK.
But, if you're only doing an ExecuteQuery() against an SQL connection to get a dataset back I don't see how Linq or anything else will help you. I'd say that the problem is obviously on the DB side.
So to solve the problem with the database you need to profile the different SELECT statements you're running against it, examine the query plan and identify the places where things are slowing down. You might want to start by using the SQL Server Profiler, but if you have a good DBA, sometimes just looking at the query plan (which you can get from Management Studio) is usually enough.