Flexible hierarchies in Saiku Analytics - business-intelligence

I have just started working on Mondrian and I am having a hard time understanding how to set hierarchies work.
Suppose that I have a Hospital dimension and I want to sum the amount of hospitals that are public or private in certain state. I have also my fact hospital with the appropriate measure hospital_amount.
The hierarchy I have built in the Schema Workbench is show below:
1- State
2- Flag (Private or Public)
3- City
4- Hospital
Doing in this way I can analyse things in Saiku Analytics plugin without major concerns, provided that I maintain the presentation order of attributes (State, Flag, City,...). But, things turn a little complicated if I want change the order that fields will be presented in the report, in other words, what if I want to build another report in Saiku without using the flag attribute.
Even if I hide the flag, Saiku will continue using it to categorize the rest of the attributes from the hierarchy (City and Hospital).
Some people said that I need to create another hierarchy in the Schema Workbench only for the flag, but this won't let me use the flag in the drill menu of Hospital.
Is there any way to build reports in Saiku without being stuck into the hierarchy order, I mean choosing fields from hierarchy in a flexible way?
Thanks in advance!

You don't mention if you are using Saiku as a BI server plugin or on standalone.
If you are using standalone, which uses Mondrian 4, you can use the "has hierarchy" attribute in your schema instead of defining a strict hierarchy which effectively creates a hierarchy for each level, which can all act independently of one another.
Or in Mondrian 3 you could just do that manually.

Related

Adding Fields to Entity for Data Import and Power Query Analysis without Touching Dynamics Solutions

I came into one of those situations that are always used as an opposite of ideal example in tutorials. Custom built CRM, and no access to the firm who built it. For that reason, for the moment, we are not touching solutions because I lack the documentation to make safe decisions on major changes to the Dynamics end of things.
That said, I use power query to analyze the data on a daily basis. For some of our needs, could I theoretically, add fields to the entities, then data import to those fields, and analyze through power query?
Does this route temporarily usurp the potential of messing up the prod environment while giving us the ability to track new data points, add them (without creating a new form to fill out) and access the data for tracking and analysis?
Am I missing any glaring relationship issues between Dynamics and CDS or does this keep the changes on the CDS side? Thoughts?
I believe your thrid party solution is managed?
However if you wish to create field in prod let’s say for account entity.
Crete new unmanaged solution, add account entity without any component(empty, just account) then create new fields, as per your requirement do not add them to any form or views.
Once you publish this fields are available for your use.
After all the analysis if you wish to delete those fields, go to your newly create solution and delete the fields (delete and not just remove)
This shall help and will not cause any issue.

How do I map relations in an eventstore used in an eventsourced architecture?

I am trying to wrap my head around structuring relationships in an eventstore. I am all new at eventsourcing so please bear with me. :-)
How should relationships be mapped in an eventstore? Can you please give me some recommendations?
Imagine, I have a domain regarding project management. I have an aggregate which is a Project. The Project aggregate root contains Tasks, Documents, Files, Folders which are collections of core entities in the Project.
I also have a ProjectBranch which can be part of the Project aggregate but it could also be looked at independently. In the ProjectBranch the previously mentioned collections can be changed, and a ProjectBranch can be merged into the Project again which updates the collections of the Project.
Some of the flow resembles a VCS system.
How should these relations be mapped and which separation of aggregates and aggregate roots should I create?
If the Project is the only aggregate, the events (I imagine) look like the following:
ProjectWasCreated [aggregate]
ProjectDocumentWasCreated
ProjectTaskWasCreated
ProjectBranchWasCreated
ProjectBranchDocumentWasCreated
(how will this event e.g. know which branch the Document belongs to)
All events that happen in a ProjectBranch will in some way have to be replayed on the Project once the ProjectBranchWasMergedToProject event happens.
On the other hand there could be a more relational structure where there are several separate aggregates - e.g. Project, ProjectBranch, Task, Document and so on.
This would mean that the domain has a different set of events which could look like the following:
ProjectWasCreated [aggregate]
DocumentWasCreated [aggregate]
ProjectDocumentWasAttached(documentId)
ProjectBranchWasCreated(projectId) [aggregate]
DocumentWasCreated [aggregate]
ProjectBranchDocumentWasAttached(documentId)
Some of these functionalities might need to work independently outside of the Project, so they would be made as standalone modules.
Thanks :-)
Let's assume that all these elements are aggregates: Project, ProjectBranch, Task, Document, and so on.
One of the basic tenets of constructing Aggregates is that they form a transactional consistency boundary, meaning that within a single Aggregate, all elements must be consistent and satisfy associated business rules at the time of a transaction.
That is why people usually stick with small Aggregate structures, with most Aggregates having just one Entity within them. It is going to be impossible for you to keep all these elements in sync and consistent, as your Project grows.
Now onto your question, the answer to relationships is in two parts:
All linkages between Aggregates should be in the form of Aggregate identities. If a Task is linked to a Project, then the Task aggregate event will contain ProjectId as an attribute.
You should not store aggregate structures inside one another.
If you were using an RDBMS, any syncing required between aggregates (if a Project is closed, for example), should be accomplished with the help of Domain Events.
But since you are using EventSourcing, you don't need to do this in the background. You dynamically construct the aggregate structure, which brings us to the second point.
Like any other EventSource projection, when you construct an aggregate object, you will need to reconstitute the internal data elements.
If you want the Project structure to be available as part of your Task projection, you make a call to the Project Application Service to retrieve the Project Aggregate in realtime.
So on and so forth for all linked Aggregates that you may want as part of your projection.

Getting into designing dashboards and need some help identifying each technical layer along the way

So I will be embarking on designing a dashboard that will display KPI's and other relevant information for my team. Since I am in the early stages of this project and am not very familiar on the technical process behind designing a dashboard, I need some questions vetted out first before I go and shop for some solutions to avoid reinventing the wheel.
Here are some of my questions:
We want a dashboard that can provide live-time information via our data sources (or as close to live-time as possible). What function allows a dashboard to update itself with concurrent datasources? From a conceptual standpoint, I can understand creating a dashboard out of Microsoft Excel, and having the dashboard dependent on the values you may have set within your pivot table.
How do you make a dashboard request information from multiple datasources on its own? Just like the excel example, a user may have to go into the pivot tables to update values, but I want to know how would a dashboard request this by itself and what is the exact method from a programming standpoint? Does the code execute itself every time you refresh the webpage?
How do you create datasources organically? I know for some solutions such as SharePoint BI Center, there are pre-supported datasources like an excel sheet or SharePoint and it's as easy as uploading your document and letting the design handle the rest. However, there are going to be some datasources that I know that will need to be fetched. Do I need to understand something else like an event recorder in order to navigate this issue?
Introduction
The dashboard (or a report, respectively) is usually the result of a long chain of steps. Very much simplified it could look like this:
src1
|------\
src2 | /---- Dashboards
|------+---[DWH]-[BR]-+
src n | | \---- Reports etc.
|------/ [Big Data]
Keep in mind, this is only a very, very simple structure of a data backend / frontend.
DWH means Data Warehouse, where data might be stored temporarily (you referred to this as fetching). This could be a database, could be a Big Data engine, could be a combination of both...
Afterwards, there are Business Rules (BR). Those might be specific rules in how different departments calculate and relate to data, but also simple things like algebra.
Questions
So, the main question should not be about the technology:
What software should we choose?
How can we create a dashboard?
but on the contrary focused on your business processes (see it like a top-down view):
How does our core process look like? Where would I like to measure data?
How would department a calculate sales in difference to department b? Should all use the same rule?
Where does everyone store the data? Can we access it? Do we need structural data?
And, very easy to forget but also easily sometimes one of the biggest parts: Is the identifier of a business object (say, sales id) everywhere build and formatted in the same way?
Conclusion
When those questions are at least in the back of your head and you keep working in this direction, more or less automatically data will spill out at certain points of that process.
Then it won't matter if you use Excel, a small-to medium app like Tableau, Tibco Spotfire, QlikView, Power BI or you want to go full scale with a big Hadoop backend, databases and JasperReports, Apache Drill, Pentaho, SSIS on top of it... it will come out eventually.
TL;DR
Focus on the processes first. Make sure to understand them. Draft in Excel. Then proceed in getting the data and the tools you need to help your use cases. It will work out much better from a "top-down" approach than trying to solve your requirements with tools only.

Dynamically generate data based notifications platform

In our project we have a requirement to create dynamic notifications that "pop" in our site when a relevant rule applies.
We are based on oracle exadata as our main database.
This feature is suppose to allow the users to create dynamic rules that will be occasionally checked.
These rules may check specific fields in certain types, and may also check these fields relatively to other types field's data.
For example, if our program has a table of cars, with a location column, and another table of streets, with location column (no direct relation between those two tables), we might need to notify the users if a car is in a certain street.
Is there a good platform that can help us calculate the kind of "rules" that we want to check?
We started looking at elasticsearch and neo4j (we have a specific module that involves a graph-like relations..), but we aren't sure that they would be the right solution.
Any idea would be appreciated :)
Neo4j could help you to express your rules, but it sounds as if your disconnected data is rather queried by SQL style joins?
So if you want to express and manage your rules in predicates in the graph you can do that easily and then get a list of applicable rules to trigger queries in other databases.

How flexible is Pentaho for dynamic transformations? (user-input based parameters)

Based on the following use case, how flexible are pentaho tools to accomplish a dynamic transformation?
The user needs to make a first choice from a catalog. (using a web interface)
Based on the previously selected item, the user has to select from another catalog (this second catalog must be filtered based on the first selection).
steps 1 and 2 may repeat in some cases, (i.e. more than two dynamic and dependent parameters).
From what the user chose in step 1 and 2, the ETL has to extract information from a database. The tables to select data from will depend on what the user chose in previous steps. Most of the tables have a similar structure but different name based on the selected item. Some tables have different structure and the user have to be able to select the fields in step 2, again based on the selection of step 1.
All the selections made by the user should be able to be saved, so the user doesn't have to repeat the selection in the future, only re-run the process to get updated information based on the pre-selected filters. However he/she must be able to make a different selection and save it for further use if he/she wants different parameters.
Is there any web-based tool to allow the user to make all this choices based? I made the whole process using kettle but not dynamically, since all the parameters need to be passed when running the process in the console. The thing is, the end user doesn't know all the parameter values unless you show them and let them chose, and some parameters depend on a previous selection. When testing I can use my test-case scenario parameters, so I have no problem, but in production there is no way to know in advance what combination the user will chose.
I found a similar question, but it doesn't seem to require user input between transformation steps.
I'd appreciate any comments about the capabilities of Pentaho tools to accomplish the aforementioned use case.
I would disagree with the other answer here. If you use CDE it is possible to build a front end that will easily do those prompts you suggest. And the beauty of CDE is that a transformation can be a native data source via the CDA data access layer. In this environment kettle is barely any slower than executing the query directly.
The key thing with PDI performance is to avoid starting the JVM again and again - when running in a web app you're already going so performance will be good.
Also; The latest release of PDI5 will have the "light jdbc" driver (EE customers) which is basically a SQL interface on PDI jobs. So that again shows that PDI is much more these days than just a "batch" etl process.
This is completely outside the realm of a Kettle use case. The response time from Kettle is far too slow for anything user facing. It's real strength is in running batch ETL processes.
See, for example, this slideshow (especially slide 11) for examples of typical Kettle use cases.

Resources