I have a data model using a private data source (not from a Subject Area), so I need to do this with a BI Publisher Report (not an Analysis).
In the data model I have a column called 'Financial Plan Type' that contains a few different values such as "Forecast", "Adjusted Budget", "Original Budget", etc.
I want to create a pivot table that pivots this column and then creates a variance column between "Adjusted Budget" and "Forecast" as example. Obviously I have an 'Amount' field in the table too.
It doesn't seem that I can do this directly in the report as the formulas and flexibility seem to be limited for the Reports (although I'm not 100% sure of this as I am fairly new to OBIEE), but I was thinking that I could adjust the data model to union in a variance amount or do something else with the data model to make this work. Does anyone have any ideas and/or best practices around doing this either in the data model or in the Report itself?
This going to be abstract, but you can do this in BIP or Analysis, depends on what the data source looks like.
If you have are able to compute the variance as an extra element in the datasource (might need to model it), then BIP RTF template designer does support Pivot tables. You might still need to add some XDO code in the loops.
If you are inclined to OBIEE, you can create your OWN data source on OBIEE. You will have to use the RPD data modeller if you are on OBIEE on premise, or write the transactional SQL if you are OTBI on the cloud.
Either way, the trick is to have the variance already computed in the XML, so BIP/OBIEE can simply print it off.
Related
Context:
I have a data model in Power pivot with three tables, tTasks, tCaseworks and tCaseworkStatus. I am attempting to create two calculated columns in tCaseworks which from the two data tables. All three tables are linked through the common field casework_id (see illustration below).
The data model is regularly updated with new data. The way I am doing this is as follows:
All three tables are sourced from three corresponding tables in my Excel workbook.
A VBA script deletes all records in the three Excel tables and then refreshes the data model (sidenote: because the data model demands lookup tables to not be empty the VBA code adds one row per table before refreshing).
New data is then added to the excel tables and the data model is refreshed.
This process works perfectly.
Problem:
The problem arises when I am adding calculated columns to tCaseworks and then attempting to update the data as described above. I have added two calculated columns; has_task and status_now. I am using the following DAX code:
has_task:
has_task =
IF (
CONTAINS (
RELATEDTABLE ( tTasks );
tTasks[casework_id]; tCaseworks[casework_id]
);
"Yes";
"No"
)
status_now:
status_now =
VAR TableX = RELATEDTABLE(tCaseworkStatus)
VAR ResultX = IF(
CONTAINS(TableX;tCaseworkStatus[casework_status_code];"Completed");"Completed";
IF(CONTAINS(TableX;tCaseworkStatus[casework_status_code];"Dismissed");"Dismissed";
IF(CONTAINS(TableX;tCaseworkStatus[casework_status_code];"Begun");"Begun";
IF(CONTAINS(TableX;tCaseworkStatus[casework_status_code];"Created");"Created";
"Find no status"))))
RETURN
ResultX
Both of these calculated columns work as expected as long as I do not delete the data in the model (I do have one hickup with both columns as described in this separated problem, but I think that is unrelated).
When the data has been deleted and I refresh the model I get the following error message:
"We cannot get the data from the data model. This is the error message we got: A circular dependency was discovered: 'tCaseworks'[status_now],'tCaseworks'[status_now],'tCaseworks'[has_task],'tCaseworks'[has_task],'tCaseworks'[status_now]."
Question:
What is creating this dependency and how can I avoid it?
My attempted solutions:
The problem only arise when there are two of these calculated columns. Any one of these two works perfectly without the other upon refreshing. I know that calculated columns are prone to circular problems, but unfortunately I need to use columns and not measures. I suspect that perhaps my choice in formula is creating the problem, most likely the contains-function. However, I don't know about any alternative ways of building the formulas I need. Any suggestions?
Edit:
I originally only posted a portion of my data model as I wanted the question to be as concise as possible but I guess it might have been confusing. The whole model concerns five objects from a case handling system: Claims, Cases, Caseworks, Tasks and Action Points. These objects are hierarchical, one claim can have one or more cases, but one case can only have one claim. Similarly, a case can have several caseworks, a casework can have several tasks, a task can have several action points. Additionally, the latter four can have a status attribute which is changed regularly.
I attempted to organize my data model in such a way that I had a lookup table for each object with unique values. I have many attributes for each object in my data that I did not include in the example above, and my goal was to add useful attributes through calculated columns in these tables. The data tables with the changes were intented to provide insight to the lookup tables.
I think your relationship model is a bit unusual. DAX works best when using something like dimensional fact model
I would consider the tCaseworkStatus a fact table since its like a log of the changes to your data. tTasks is a dimension, since it just add an extra dimension to your data.
The tCaseworks is not necessary since it doesn't hold any actual data (only calculated data).
if you want your current model to work, it might fix your problem if you just delete the relationship between tTasks and tCaseworks, and add a new between tTasks and tCaseworksStatus
edit.
it just occurred to me that the reason you have it like this, is that you may have a many-to-many relationship between tTasks and tCaseworksStatus. if that is the case you might have to create a proper many-to-many table. which is kind of what your tCaseworks is, but you cant have a relationship to the same key like you currently have.
edit2.
the solution seemed to be that somehow the Relatedtable function in conjunction with the relationship model was causing the error. using Lookupvalue instead seems to to have fixed the issue.
Context:
I am creating a dashboard in Excel based on the data model I am building in Power Pivot. The source data in the data model is based on various other excel tables I am regularly receiving and copy-pasting into my workbook (their incoming structure is out of my control). My goal is to perform all data processing within Power Pivot/DAX rather than manipulating the data in the worksheets before loading into the model.
Problem:
In my model, I have a table (tabCases) which includes status updates on all cases from a management system. This table has a column named case-ID (not unique). I need to create a lookup-table with unique case-id's where I can create new columns with various KPIs for each case.
How can I do this in Power Pivot?
I found two suggestions in this article but none of them work for me (opt. 1 because it requires a manual creation of the unique ID list and opt. 2 because I don't have a database access).
In my mind there should be something really simple I could do, such as i.e.:
Add new table to data model
Set first column to be equal to DISTINCT(tabCases[caseID])
Is there such a way?
A Linkback Table might help you. Please see the link below:
https://www.sqlbi.com/articles/linkback-tables-in-powerpivot-for-excel-2013/
Thanks
I want to store employees record. I don't want to use any external libraries or framework. I am trying to build the data structure from scratch.
There will be three fields,
EmployeeName
Age
Salary
We also want to query like,
Get all the salary where EmployeeName = "Bill"
Get all the EmployeeName where salary > 2000
Get all the Salary where age='50'
I am open to use any language but not any built-in package. What is the recommended data-structure to achieve it ?
I assume that the purpose of this exercise is self-education.
If so, Where to begin reading SQLite source code? is a great place to start reading to understand how this kind of software can be built.
If you really want to roll your own, I would suggest storing your data in an array of structs/objects/dictionaries (what they are called will depend on your language), hidden behind an object so that your insert/update/delete methods on the table go through well-defined access functions. Your operations can be implemented inefficiently with grep, filter, etc depending on your language. In addition to the obvious fields, include deleted as a field. That way you can just update that to delete a record, rather than try to modify the table.
To make them more efficient, read through https://cstack.github.io/db_tutorial/parts/part7.html for how to write a b-tree. Then create a b-tree mapping EmployeeName to the list of indexes of records with that name, ditto for age and salary. Now modify the access methods to update the indexes for those fields when you modify the table. Your searches can now go through the b-tree to find the indexes of the records that you want, and then you can look in the table for them.
This is massively simplified compared to what a database gives you, but you're on your way to understanding how databases work. Both in terms of why they scale, and also why they aren't magically fast.
We have an audit table which we get from OLTP system, it records any activity done by the user including if he downloaded some attachment, or read some note or written some note , or any change for an incident etc.How do we include these audit table activity in our dimensional model for incident management system(IT service management)?
On a simple level, which is all I can provide based on the level of detail in the question, is to look at your audit table and decide which categories of audit you want to be a dimension. Perhaps there are audit_type, user_type, and audit_subtype fields or something like that? Also, typically you have another field called a "measure" or "quantity", which is typically used for stats on numerics, to support aggregate functions. For example, you might typically have store_id, product_cat as categorical dimensions, but roll up sales$ as min,max,avg,stdev grouped by different date types like month, quarter and other dimensions. If your data is purely categorical by date, then COUNT() is usually used as a calculated measure.
You really just need to decide how you want to be able to drill up and drill down though the data, which categories matter, and which quantities matter. Once you decide that, create a flat table with FKs to lookup tables. A star schema is simply a fat table with a bunch of lookup tables floating around it like a star.
Hope this helps
I have to design data warehouse model and ETL process for class at my University. My data warehouse has to store opinions / comments about a product, each record should consist of:
comment text (String)
product score ({0, 0.5, … , 4.5, 5})
comment author (String)
comment date (Date)
product recommendation ({Yes, No})
comment up votes (Int)
comment down votes (Int)
product pros (many Strings, e.g {price, design, durability, … }) and its count
product cons (many Strings, e.g {too loud, too heavy, price, … }) and
its count
In addition data warehouse should store information about product:
product category
product brand
product model
I want to create data warehouse model first, but I have problem with storing product pros and cons as it is many-to-many relationship. In normal relational database I would simply create associative table, but here I am not sure how to proceed, after all I don’t want to normalize facts table.
I am considering 3 approaches, first, which I presented in diagram below. I used bridge table method (though, I don’t know if correctly) to get rid of many-to-many relationship. I don’t know how it will impact querying performance.
Second approach I may use is boolean column method. In PROS and CONS table I can create a column for each possible value, but there can be up to 100 different pros or cons. Also number of possible pros or cons is not constant in time. Authors in their comments can list new pros or cons (that’s how it works in data source), but I can’t add new columns (I shouldn’t change data in data warehouse).
Third approach I am considering, is to keep pros in PROS table but in 1 column, where values will be separated using commas or some other delimiter e.g. “price, design, color”. It keeps things simple but hard to analyze or slice & dice.
Which approach should I use in this situation? Which is better for loading data into data warehouse, because form data source I will get all the comments and I want to only load comments that are new since last loading?
What I think is, if we can get your first option little bit modified to than what you have said here, it would be the best as I understand.
in your image you have provided, having the Pros_Bridge_Detail table is fine. The rest need to be changed.
you can remove the pros_Bridge table that holds just the count. you can actually add that column to your COMMENT fact table you have up there. That would be more efficient and easy when it comes to queries rather than querying in many tables.
you said you have many areas to give pros like price, design, durability etc. Lets put those stuff into a separate dimension.
Add a new column to your Pros_Bridge_Detail table to hold the ID of the newly created Dimension that holds the product pro types (Design, durability etc).
Now, once you add a product Pro, the Pros_Bridge_Detail table will have the pros the user give and also hold the value of regarding what the pro is given via the ID of the new dimension.
Also don't forget to store the Comment ID as well in Pros_Bridge_Detail table as that will be your link (FK) to Comments fact table you have.
Same can be done to Cons as well.
Hope you understand what I just explained and hope it helps. let know if you have any issues.