For testing purposes I deployed:
an Azure SQL DB with some data
a Tabular Model in Azure Analysis Services connected to the SQL DB to get the data
The test was meant to compare the speed of the queries against the Azure SQL DB with those against the Tabular Model.
The Tabular Model in the tests consists of 4 dimensions but only 2 of those dimensions are used in the queries. I suppose queries against a Tabular Model cannot handle more than 2 dimensions ?
The queries are run from a .NET console application running on a local computer. The queries against the Tabular Model use the ADOMD.NET client library and are written in DAX (a language I have no experience with) and come from the design tool in SSMS. The queries against the SQL DB use the ADO.NET client library (containing an aggregate function, 7 inner joins and some "where clause" parameters).
The test consisted of 10 queries for each system with a waiting time of 500ms between each query. The time of each query plus overhead from the console app executing the client library is measured with a System.Diagnostics.Stopwatch. The average duration of the queries for the Tabular Model was twice as long (957,6ms) compared to the SQL SB queries (529,1ms).
I expected the queries to the Tabular Model to be faster because of Analysis Services being optimized for such analytical queries containing aggregates and joins.
Can anyone explain why it doesn't perform better ? Or why one would use Tabular Models as opposed to running SQL queries directly on the relational DB ?
The amount of time the queries will take executing on SQL DB should be roughly the same unless your hand-crafted SQL is particularly poor performing. The time taken by Analysis Services to fit the data coming back from SQL DB to your semantic model is where the extra delay is coming from.
The value of using Direct Query here is that you can provide the user with a semantic model that is more intuitive to them since it is expected the user will not be a DBA. On top of this, the semantic model will in all likelihood include calculations, measures, KPIs, etc.
If you do not need to provide a business focused semantic model, and you are happy doing all calculations and aggregations in the SQL query then you may not need Analysis Services.
Of course the other advantage of using Analysis Service with Direct Query mode off is that you can store data in-memory rather than on disk to improve query performance times. Another major benefit is that you can point the semantic model at multiple data sources so your model can be a centralized source of data for a business user.
Finally, there is no limit to the number of dimensions the Tabular Model can use...
The Tabular Model in the tests consists of 4 dimensions but only 2 of
those dimensions are used in the queries. I suppose queries against a
Tabular Model cannot handle more than 2 dimensions ?
Related
I am doing some research about what the best possible state that data should be in so that reporting and BI analytics perform well but can be produced by business users from a set of various data collections which align with a business data glossary that I have worked through.
We have not chosen a specific BI tool but have been playing around with Power BI and Sisense
We have not decided on a data store technology to use for reporting purposes
Origin Data
Our business application that the data will originate from has a normalised SQL relational database. There are quite a few tables and joins to consider which work fine from an application perspective but I have recommended supplying the output of those queries as a flat denormalised set of data to increase redundancy and remove the joins entirely.
Business Data Glossary
As we go through defining the business data glossary, the number of columns increases but I do not anticipate there being any more than 100 columns per row as a complete reporting set of data. I wanted to ensure that each row of data is at a transactional depth (level 0) and that the roll up through the data would be done through aggregations by distinct key values and dimensional taxonomy.
Architecture
I want some advice around what a modern architecture looks like and what works for business users rather than users who are comfortable with SQL queries and a myriad of joins on a physical data model.
I read an article about setting up data flows for Power BI which looked like they type of thing I want to do from a data availability perspective but it doesn't advice on how the data should be stored and what type of database to use.
Data Sets
The data we have that needs to be reported on are transactions where level 0 is trade positions (individual transactions from either a local or counterparty entity), level 1 is reconciliations (relating local and counterparty entities and trade linking identifier) and level 2 would be where it can be rolled up by taxonomy like asset type or status.
The current data set size would be a snapshot of positions every business day so, its duplicated every day with a snapshot date applied. The reports would be able to move across dates and show changes over time.
Any advice would be greatly appreciated on how to tackle reporting and BI in 2020. Oooh, one last thing, there is the possibility that we won't be allowed to process this type of data in the public cloud, we have our own infrastructure which is on private cloud so, that might need to be a consideration. Thanks
We have a B4ms VM running a SQL server (as well as web server). We have installed Power BI Gateway on it to make reports with on-prem data.
Basically the user can sign to the server and view power bi reports in the browser.
I find it a bit dumb that the user has to query Power BI for the data, that in turn gets it from the machine, but perhaps there is no other way.
The issue we are running into is that some visuals take a huge performance hit when loading. Some even seem to exceed the resources.
I know it's somewhat of a broad question to ask, but maybe specifically - is there a way to improve the connection between the VM and the PBI server?
It will depend on the type of query that you are doing/sending down to the SQL Server, for a number of projects that I have deployed, I have used Direct Query to sit over data sources that have been at least 50-100GB, however these have been mostly standard Star Schema data warehouses, or a defined reporting table, both will have the relevant indexes, covering indexes, or Column Store Indexes to allow more efficient retrieval of data. These have been on Azure SQL and On-Prem SQL Instances.
Direct Query Mode will slow down due to the number of query's that it has the do on the data source based on the measure, relationships and the connection overhead. Another can be the number of visuals on page, as each visual is a query and each one has to run on the data source.
One other method to increase the speed of Direct Query would be to use Aggregations in Power BI, to store an imported subset of data in Power BI. If the query can be answered by the aggregation layer then it will be answered quicker. Microsoft demonstrated this with the 'Trillion Row Demo'
In terms of the Power BI Direct Query Issues, from the range of clients that I interact with, those that do have issues with Direct Query, have a mash up of tables in an inefficient schema, running sub optimal query's on the data source, with a number of data transformations in DAX, and DAX measures that have been badly written, for example lots of DISTINCT COUNTS & SWITCH.
For the connection make sure you have the latest Data Gateway Installed/Update as optimizations to the Mash Up engine can make it faster. Another option would be to shift the DB to Azure SQL Database and remove the need for the gateway.
For DirectQuery reports you need to examine the generated SQL and evaluate the execution at SQL Server. You can use the Performance Analyzer in Power BI Desktop to capture the DAX and SQL generated as your DirectQuery model interacts with SQL Server, and then use SQL Server Management Studio and the Query Store to examine the Execution Plans and indexing options.
Here is current scenario - We have 3 tables in Oracle DB (with millions of records) which are being used to generate SSRS reports.
These reports are displaying complex data calculation such as deviations, median etc.
SSRS fetch data using stored procs in oracle (joining all the 3 tables) based on date parameters
Calculations are performed in SSRS and data is displayed in tables and charts
Now, for small date duration, report is getting generated quite fast, so no issues there.
When date range is big like a week or 2-3 months, report takes lot of time to process and most of the time it gets timed out as well.
To resolve this issue, I am thinking to remove calculations from SSRS and move them to DB level. Where we can have pre-calculated data
which will be served to SSRS reports for faster report generation.
In order to do this, I can see 2 options -
Oracle Materialized Views
SSAS Cube
I have never used Materialized Views before, so I am a bit skeptical about its performance specially FAST REFRESH issues.
What way would you prefer? MV or SSAS or mix of both?
Data models (SSAS) are great for organizing data, consolidating business logic, and defining how calculations behave in different scopes. They are generally faster to query than the raw data which is what you currently have. There is some caching involved, but you still have to query the data and wait for it to be processed. Models are also most appropriate when you have multiple reports that will be using a common set of data.
With a materialized view, you can shift the heavy lifting of calculation time to the scheduled refresh. Think of it as essentially the same as creating a new table that is refreshed by a procedure. This will greatly improve query times for the report especially if the date column you're filtering on is indexed. Also, the development and maintenance requirements are much lower for this than a model.
So, based on your specifications I would suggest the materialized view.
I would concur with the Materialized View (MV) approach. Depending on the amount and type (insert vs update vs delete) would determine if a fast refresh is possible or practical.
Counter intuitively, a FULL refresh is often a better approach, since you can better take advantage of set based SQL processing, together with parallelism to build the MV.
I am working on Oracle SQL Developer and have created tables with Wikipedia data, so size of data is very huge and have 7 tables. I have created a search engine which fetches and display data using JSP, but the problem is that for each query the application has to access 4 tables making my application very time consuming.
I have added indexes to all tables but still it takes more time, so any suggestion on how to optimize my app and reduce time it is taking to display result.
There are several approaches you can take to tune your application. And it could be either tuning at the database end, front end or a combination of the two.
At the database end you could be looking at say a materialized view to summarize the more commonly searched data. This could either be for your search purposes only or to reduce the size and complexity of the resultset. You might also look at tuning the query itself - perhaps placing indexes on the relevant WHERE clauses of your search or look at denormalizing your tables.
At the application end - the retrieval of vast recordsets - can always cause problems where a single record is large (multi-columned) and the number or records in the resultset - numerous.
What you are probably looking for is a rapid response time from your application so your user doesn't feel they are waiting ... and waiting.
A technique I have seen and used is to retrieve the resultset either as
1) a recordset of ROWIDs and to page through these ROWIDs on the display
2) a simulated "paged" recordset. Retrieving the recordset in chunks.
What is the best way in terms of speed of the platform and maintainability to access data (read only) on Dynamics CRM 4? I've done all three, but interested in the opinions of the crowd.
Via the API
Via the webservices directly
Via DB calls to the views
...and why?
My thoughts normally center around DB calls to the views but I know there are purists out there.
Given both requirements I'd say you want to call the views. Properly crafted SQL queries will fly.
Going through the API is required if you plan to modify data, but it isnt the fastest approach around because it doesnt allow deep loading of entities. For instance if you want to look at customers and their orders you'll have to load both up individually and then join them manually. Where as a SQL query will already have the data joined.
Nevermind that the TDS stream is a lot more effecient that the SOAP messages being used by the API & webservices.
UPDATE
I should point out in regard to the views and CRM database in general: CRM does not optimize the indexes on the tables or views for custom entities (how could it?). So if you have a truckload entity that you lookup by destination all the time you'll need to add an index for that property. Depending upon your application it could make a huge difference in performance.
I'll add to jake's comment by saying that querying against the tables directly instead of the views (*base & *extensionbase) will be even faster.
In order of speed it'd be:
direct table query
view query
filterd view query
api call
Direct table updates:
I disagree with Jake that all updates must go through the API. The correct statement is that going through the API is the only supported way to do updates. There are in fact several instances where directly modifying the tables is the most reasonable option:
One time imports of large volumes of data while the system is not in operation.
Modification of specific fields across large volumes of data.
I agree that this sort of direct modification should only be a last resort when the performance of the API is unacceptable. However, if you want to modify a boolean field on thousands of records, doing a direct SQL update to the table is a great option.
Relative Speed
I agree with XVargas as far as relative speed.
Unfiltered Views vs Tables: I have not found the performance advantage to be worth the hassle of manually joining the base and extension tables.
Unfiltered views vs Filtered views: I recently was working with a complicated query which took about 15 minutes to run using the filtered views. After switching to the unfiltered views this query ran in about 10 seconds. Looking at the respective query plans, the raw query had 8 operations while the query against the filtered views had over 80 operations.
Unfiltered Views vs API: I have never compared querying through the API against querying views, but I have compared the cost of writing data through the API vs inserting directly through SQL. Importing millions of records through the API can take several days, while the same operation using insert statements might take several minutes. I assume the difference isn't as great during reads but it is probably still large.