I am developing a report application in Power BI desktop version. I successfully created a dataset using a query and applying the filters on result data. But Now i have to get data from database in real time with user filters i.e. dataset would be created on the basis of some inputs given by users. We need this as database size is quite huge and we can not load the data then apply filters and create reports.
Same can easily be done in Dot Net application but we have to achieve this on Power Bi.
Please suggest if this can be done.
I would use the Query Parameters feature for this. You add them in the Edit Queries window, from Home / Manage Parameters, then you can use them in Calculated columns or replacing a "hard coded" filter.
There's a detailed write up in a recent blog post:
https://powerbi.microsoft.com/de-de/blog/deep-dive-into-query-parameters-and-power-bi-templates/
Related
I have two chart tables both with different data sources. I want one table to act as the filter to the other table.
Here is the problem...
I tried a custom query for my data source which used the email parameter to filter the data source.
The problem is every time a user changes a filter on any page a query is executed in BigQuery, slowing the results and exponentially increasing my BigQuery monthly charges.
I tried blending the two tables.
The problem is the blended data feature only allows for 10 dimensions to be added to the resulting blended data source and is very slow.
I tried creating a control filter using a custom field on the "location" column on each table sharing the same "Field Id".
The problem is that the results table returns all the stores until you click on a location in the control list. And I cannot let a user see other locations.
Here is a link to a data studio sample report you can clearly see what I am trying to do.
https://datastudio.google.com/reporting/dd33be45-ab13-4881-8a3b-cabafa8c0dbb
Thanks
One solution which i can recommend to over come your first challenge, i.e. High cost. You can customize cost by using GCP-Memorystore, depending on frequency of data that is getting updated.
Moreover, Bigquery also cashes data for a query if you are not using Wild cards on tables and Time partitioned tables. So try to customize your solution over analysis cost if it is feasible over your solution. Bigquery Partition and Clusting may also help you in reducing BQ analysis cost.
I'm endeavoring to develop an application that uses Oracle as the database back-end. The application will calculate several statistics from the various tables in the database. The front-end will most likely be a web application and this front-end will display various charts and calculated statistics. Now, I imagine that it would be more efficient to perform the calculations in the database rather than in the service layer because said calculations would need to be performed for every web request. That being the case, I'm not sure which mechanism to use. (e.g. stored procedure, function, view) To illustrate what I'm going for, suppose I want to keep statistics of student grades for many students. I would like to have a web interface that lets me view those statistics on student-by-student basis and also an all-inclusive basis. Some of the stats are dependent on aggregates (e.g. average, min, max) of all of the student grades and some stats are dependent only on an individual student. In this situation, every time a record is added or updated, the aggregates would have to be recalculated. So I am speculating that if I had a special table that held all of the calculated values I need and a trigger(s) to recalculate everything when a record is added/updated then all I would need to do from a web request point-of-view is have the service layer pull the desired values from this special table. I'm just not sure if this is the best way to go or not so I am asking the community for any input/advice. Note: Although I'm using Oracle, I'm open to using PostgreSQL or mySQL.
Thanks in advance
The scenario you are describing would be ideal for using materialized views. They can be designed to refresh automatically (and incrementally) every time the source data is updated by your application. The calculations would be built in to the view definition. No triggers required, and likely no stored procedures unless your calculations involve multiple steps. Check here: https://oracle-base.com/articles/misc/materialized-views and here: https://medium.com/oracledevs/lightning-fast-sql-with-real-time-materialized-views-12-things-developers-will-love-about-oracle-54bcc9eac358 for more info.
I am looking for a way to extract power queries metadata from power query editor to spreadsheet or word for documentation purposes to understand the transformations or formulas applied in each query present in power query editor.
I have read different comments in other sites including renaming .XLSX to .ZIP and inside xl\connections.xml there's a Microsoft.Mashup.OleDb.1 data connection with some metadata but I am not successful in extracting the queries metadata. I am looking for any automated process to extract power queries transformation data into spreadsheet outside of power query. Any suggestions or ideas will be great help for me.
You can access the code underlying any Power Query in Excel through the Queries object that is part of the workbook. It's in "Formula" property of the Query object. You can also get the name of the Query with the "Name" property. It just gives you the code as plain text, so it would be up to you to apply any context to that.
for i = 1 to ThisWorkbook.Queries.Count
ThisWorkbook.Queries(i).Name
ThisWorkbook.Queries(i).Formula
next
Note this only works in Excel 2016 or later. Older versions of Excel where PQ is installed as an add-in can't access PQ through VBA. I'm also unaware of any method to extract information on the dependencies between Queries within a workbook (though with consistent naming conventions you could pretty easily build this yourself I figure).
Based on the following use case, how flexible are pentaho tools to accomplish a dynamic transformation?
The user needs to make a first choice from a catalog. (using a web interface)
Based on the previously selected item, the user has to select from another catalog (this second catalog must be filtered based on the first selection).
steps 1 and 2 may repeat in some cases, (i.e. more than two dynamic and dependent parameters).
From what the user chose in step 1 and 2, the ETL has to extract information from a database. The tables to select data from will depend on what the user chose in previous steps. Most of the tables have a similar structure but different name based on the selected item. Some tables have different structure and the user have to be able to select the fields in step 2, again based on the selection of step 1.
All the selections made by the user should be able to be saved, so the user doesn't have to repeat the selection in the future, only re-run the process to get updated information based on the pre-selected filters. However he/she must be able to make a different selection and save it for further use if he/she wants different parameters.
Is there any web-based tool to allow the user to make all this choices based? I made the whole process using kettle but not dynamically, since all the parameters need to be passed when running the process in the console. The thing is, the end user doesn't know all the parameter values unless you show them and let them chose, and some parameters depend on a previous selection. When testing I can use my test-case scenario parameters, so I have no problem, but in production there is no way to know in advance what combination the user will chose.
I found a similar question, but it doesn't seem to require user input between transformation steps.
I'd appreciate any comments about the capabilities of Pentaho tools to accomplish the aforementioned use case.
I would disagree with the other answer here. If you use CDE it is possible to build a front end that will easily do those prompts you suggest. And the beauty of CDE is that a transformation can be a native data source via the CDA data access layer. In this environment kettle is barely any slower than executing the query directly.
The key thing with PDI performance is to avoid starting the JVM again and again - when running in a web app you're already going so performance will be good.
Also; The latest release of PDI5 will have the "light jdbc" driver (EE customers) which is basically a SQL interface on PDI jobs. So that again shows that PDI is much more these days than just a "batch" etl process.
This is completely outside the realm of a Kettle use case. The response time from Kettle is far too slow for anything user facing. It's real strength is in running batch ETL processes.
See, for example, this slideshow (especially slide 11) for examples of typical Kettle use cases.
I am running into issues with out of memory exceptions . I need to display a large set of data in a cross tab. I need to display 5,277,888 rows aggregated into 403,920 rows. I don't think birt can handle this and would like some advice.
These are the options I was thinking
Some how fetch some data at a time and aggregate it (might still run out of memory)
Find a different reporting framework that renders html
Not use cross tab and do all of the aggregation server side and try to display it in a sudo cross tab.
Fetaching large amonunt of data and providing it to BIRT increases data trafic and also many a times (as in your case) lead to system / report engine hang.
What you are thinkning is correct. (Option 3) It is preferable to use aggregate functions in your db and give an already summarized data to BIRT at times.
SQL provides options for cross tab outputs (SQL Pivot Function) as well in case required.