In SAP HANA I am used to create Calculation Views.
Previously I learned that Calculation Views (which after compilation are column-views) are to be prefered over Database-SQL-Views.
Now with CDS-Views I am not sure if this is still the case. Especially with regards to performance.
Also what is now the difference between a table function (which replaced scripted calculation views) and CDS Views?
Ok, this is a question that I believe requires some background to be answered.
A long, long time ago...
When SAP HANA was first developed, it heavily reused concepts and technology from other, already existing SAP products (TREX, P*TIME, MaxDB, Business Warehouse Accelerator).
One of the fundamental elements of the high query performance was (and is) the column store data-storage, which came in large parts from the TREX/BWA products. These products, in turn, had been solutions to very specific problems (full-text search for catalogs and speed-up of analytical queries from the SAP Business Warehouse data warehouse product).
Especially the BWA use case reflects in the column views of SAP HANA. Due to the limited use case of supporting SAP BW queries, no general SQL/relational query support was required (e.g. no arbitrary join-chain optimizations, no SQL features beyond SQL:92 etc.) whereas other, rather exotic features (like "vertical join") that could be used by SAP BW, were built into a query tool/engine ("engine" clearly was a very popular term with the SAP developers).
Once HANA proved successful as a platform to run SAP BW on, the next step was to add flexibility and make more general platforms like SAP Netweaver (the software that SAP's business solution products run on/with) working on SAP HANA. Now, SQL features were added and those required additional capabilities from the query optimizer and execution "engines".
Query optimization had to be flexible and fast and should lead to query performance that would still beat the existing RDBMS vendors' offering (which had been around for 40+ years).
This, clearly, is a hard problem and throwing is operational aspects of DB development (scaling, solution deployment, data federation, etc.).
This led to an overlapping development of different tools addressing different aspects of DB development.
SQL support and the underlying SQL optimizer were made more powerful, so much so, that (some) SQL queries could be as fast or faster than those modeled in calculation views. And since both of these "query frontends" eventually had to talk to the same internal data structures (row/column store) it was desirable to have just a single query optimizer, that would support all the different use cases.
Somewhere around HANA 1 SPS11/12 most calculation views started to be "unrolled" internally to feed into the common optimizer (that was what the "Execute in SQL Engine" flag was about).
I'd say, since then, the performance argument for using calculation views only holds in very specific circumstances.
I mentioned the overlapping developments and CDS (core data services) is one of them. The idea here is a very different one from SQL. While SQL gives you "the way to talk to the database", CDS wants to give your application a single data definition, that is used by the UI, the program logic and the data storage/query execution.
SQL != CDS
This probably needs some context (again): a major usage pattern of how SQL databases are used by application developers is that the application is written in some form of OO-implementation and the talking to the DB is left to a mapping layer/library (e.g. O/R-mappers). This means, that the knowledge of what the application is about (aka business process knowledge), is spread out in the application.
There is some information about it in the UI (labels, formatting, visibility, ...), some of it is in the application-object model (object dependencies, hierarchies, value domains...) and then some of it is in the queries against the database.
Such scattered knowledge/definition makes it hard to make changes consistent, which in turn, slows the development process and in turn prolongs the time until the application can run and deliver some positive outcome.
"Time-to-value" is the thing under optimization here as this is important for companies that give the promise of "success through innovation".
Ok, so this CDS thing is now part of the development models proposed by SAP and nearly en-passant also addresses topics like schema evolution and deployment of the data model. It is, in fact, independent of the actual database platform as shown in the CDS for ABAP variety.
How does this lead back to query performance? It does not really.
CDS' advantage is that one can provide more information about the data model than what is possible in HANA SQL.
Associations and joins with cardinality declaration (albeit now retrofitted to plain SQL) can enable the optimizer to use additional optimizations. Yet, the same optimizer and the same query execution "engines" are used here.
So, from a (query execution) performance point of view, it does not make a big difference, as long as no query semantics are required for which CDS does not have syntax (e.g. some window functions).
The main point of CDS really is about application development process performance and whether that works well with how you do development really depends on how much of it you can use.
Now for the question "scripted calc view" vs. "table function" vs. "CDS view".
Looking at these different object types from the point of "what can I do with them functionally?" will result in the observation "basically, the same".
The difference lies in how these can be optimized (scripted calc views cannot be generally unrolled into the global query to be optimized), and what one can do with the object once created.
Table functions allow for very easy reuse across multiple views and queries. They also provide the option to provide parameters into the function (similar to parameterized views) and in addition allow for imperative coding.
Functionally speaking, table functions are a kind of swiss-army knife; one can do nearly anything with them and they still can be part of global query optimization.
CDS views, as mentioned above, are nothing "special" in terms of query runtime or optimization. The main reason why CDS views are "a thing" is that with HANA SAP started to develop development models (such as XS, XSA, CAM) that revolve around "virtual data models".
The idea for those is that the structure of tables very often is stable and changes only little over time.
In a way, this is the "write-schema" of applications that enter the data into tables.
The "read-schema" is most of the time different from that. Queries re-combine the normalized data into records that the application can map into objects. This allows applications to look at the data differently than the original application.
With "virtual data models" these queries are baked into tangible development artifacts (the views) that can be reused across the application. In fact, these can be treated as if this was the database with its tables, presented in a way that makes sense for the application.
Once again, if that is something that is beneficial for your application development depends on how your application development looks like.
Can you use HANA without CDS? Absolutely, and there are many areas where CDS lacks (i.e. the limited syntax and feature mapping to HANA features) but it does have its merits.
Should you abandon calculation views?
I would not necessarily change existing developments if they still serve their purpose, but calculation views certainly are an odd development object. Training folks in using those and SQL most likely is overly expensive compared to just sticking to SQL.
Personally, I prefer the code-based SQL development (better tooling available, allows for easier comparison with other DBMS, doesn't require WEB IDE/HANA Studio).
The only thing, SQL based development does not provide is the extended annotations/semantic information used by the SAP analytic frontend tools (SAC & BO) - these really are specific to CDS and Information Models (calculation views) but barely used by other analytic tools.
And that's my take on it.
I would add that
Calculation Views are semantically richer. A SQL View does not know about measures, dimensions, hierarchies. https://blogs.sap.com/2019/08/26/what-is-the-difference-calcview-versus-sql-view/
The difference from the execution plan point of view is getting less and less. In Hana 2.0 SP4 most graphical calc views are turned internally into a single SQL statement to be executed by the SQL engine. So in that sense, using a CalcView gives you the additional information about the model plus the query performance of the SQL engine.
Lars' explanation of CDS is perfect. Nothing to add there.
But Imagine the situation when you can't create a table function because of limited license (aka runtime version). Just stay with scripted views.
The main advantage of Hana artifacts over CDS at present is the ability to use input parameters in complex cases to optimize resources and query performance - when your logic is pushed down into DB instead of AS / app. But many native SQL features are still not available in graphical views (for example - exists, JOIN on BETWEEN), so I think that 10 years later HANA artifacts will become "very rare".
So learn CDS syntax :)
Always a glad experience reading an article or pov from Lars, on any media (StackOverflow, SAP blog, article, twitter).
I just want to point out that another thing that I miss from the SQL scripting (SP, TF, SF) is the join optimization and SQL propagation that Information View has.
This is for me the focus to flexible models (apart from dynamic join that is only relevant for certain scenarios), to deliver one view that will perform depending on which columns the user or app will request.
For the semantics use, I can simply expose a TF inside an information view to add some.
You can tell me that CDS have both options available (join optimization, SQL propagation, and annotation) but for advanced or complicated scenarios (window functions not present at CDS), and also for non-SAP developers, it will be more simple and the go-to approach for beginners
Is there a way to export all Relational Models for a schema in Oracle Data Modeler to a single PDF file? Where each model is a separate page?
I have an ERD for my schema consisting of about 90 tables. This full model can be hard to read. To account for this, I have created several additional Relational Models that cover subsets of data. For example, another model that consists of just fives tables pulled from the full model, showing their mappings. This is to better demonstrate the relationships of this subset of data in our application's workflow. They relate to several other tables, but these five on their own more easily demonstrate how these items work together from a user's perspective.
I can print each relational diagram out to separate PDFs using File -> Print Diagram -> To PDF File..., but this leaves me having to manually combine nearly a dozen different PDFs. Is it possible to export them all out to a single file at once? Data Modeler seems to only focus on the open diagram that is in focus, and ignores everything else when I'm working with print options.
If that's not possible, is there at least a way to print all the models to separate files with a single click? Opening each model and printing them separately is overly time-consuming.
Not today.
What you can do, in version 4.2 is have your design/model with your smaller diagrams implemented as SubViews.
When you run the 'All Tables' report and export to HTML, you get a TOC/index page with links to the data dictionary reports for each object, and you also get links to each SubView diagram in the HTML. So it's a single report, with different pages for each diagram. It's just HTML instead of PDF.
It's NOT PDF, but I would argue slightly that HTML is easier to work with.
We could always create an ER to give you exactly what you're asking for though. I'm assuming that maybe Adobe Writer could take multiple PDF files and edit them into a single document as a workaround?
So I will be embarking on designing a dashboard that will display KPI's and other relevant information for my team. Since I am in the early stages of this project and am not very familiar on the technical process behind designing a dashboard, I need some questions vetted out first before I go and shop for some solutions to avoid reinventing the wheel.
Here are some of my questions:
We want a dashboard that can provide live-time information via our data sources (or as close to live-time as possible). What function allows a dashboard to update itself with concurrent datasources? From a conceptual standpoint, I can understand creating a dashboard out of Microsoft Excel, and having the dashboard dependent on the values you may have set within your pivot table.
How do you make a dashboard request information from multiple datasources on its own? Just like the excel example, a user may have to go into the pivot tables to update values, but I want to know how would a dashboard request this by itself and what is the exact method from a programming standpoint? Does the code execute itself every time you refresh the webpage?
How do you create datasources organically? I know for some solutions such as SharePoint BI Center, there are pre-supported datasources like an excel sheet or SharePoint and it's as easy as uploading your document and letting the design handle the rest. However, there are going to be some datasources that I know that will need to be fetched. Do I need to understand something else like an event recorder in order to navigate this issue?
Introduction
The dashboard (or a report, respectively) is usually the result of a long chain of steps. Very much simplified it could look like this:
src1
|------\
src2 | /---- Dashboards
|------+---[DWH]-[BR]-+
src n | | \---- Reports etc.
|------/ [Big Data]
Keep in mind, this is only a very, very simple structure of a data backend / frontend.
DWH means Data Warehouse, where data might be stored temporarily (you referred to this as fetching). This could be a database, could be a Big Data engine, could be a combination of both...
Afterwards, there are Business Rules (BR). Those might be specific rules in how different departments calculate and relate to data, but also simple things like algebra.
Questions
So, the main question should not be about the technology:
What software should we choose?
How can we create a dashboard?
but on the contrary focused on your business processes (see it like a top-down view):
How does our core process look like? Where would I like to measure data?
How would department a calculate sales in difference to department b? Should all use the same rule?
Where does everyone store the data? Can we access it? Do we need structural data?
And, very easy to forget but also easily sometimes one of the biggest parts: Is the identifier of a business object (say, sales id) everywhere build and formatted in the same way?
Conclusion
When those questions are at least in the back of your head and you keep working in this direction, more or less automatically data will spill out at certain points of that process.
Then it won't matter if you use Excel, a small-to medium app like Tableau, Tibco Spotfire, QlikView, Power BI or you want to go full scale with a big Hadoop backend, databases and JasperReports, Apache Drill, Pentaho, SSIS on top of it... it will come out eventually.
TL;DR
Focus on the processes first. Make sure to understand them. Draft in Excel. Then proceed in getting the data and the tools you need to help your use cases. It will work out much better from a "top-down" approach than trying to solve your requirements with tools only.
In working with Magento's EAV system, I have developed a PHP-based tool which will assemble the tables' data back into a tabular format as well as identify "holes" in the attributes between or across products. It works with both the product and category system, and allows determining if an attribute has NOT been inserted into a particular product, or if data is consistent across stores, or stores which are inconsistent - things that are not easy through normal SQL queries and which might be tough to program with Magento native coding.
I plan to post a link to this via github and here when complete.
My question is, are there other platforms or system which make use of EAV? I would like to make my application as system-agnostic as possible. Thanks.
I need to compose a report using multiple subreports, "chained" together at runtime in a C# Forms project.
The subreports each represent a subtest of a product, and the data needs special formatting to make sense to the report users ( Special graphs, sensible column names with/without engineering details etc )
I Imagine that every Subreport has a subreport field so that I at runtime can insert the next subreport into. Obviously the first (main) report has a subreport as well, and a finalizing subreport does not (summary subreport)
Is it possible to build chain subreports together at runtime ?
Does anyone out there have a sample?
Kind Regards
Jes
I imagine this is possible with the Reporting Services product, but I don't know how to do it. In our experience as part of the ActiveReports team, we've found that subreports are also not always the most performant and memory efficient way to accomplish this.
For information about how we suggest to do this with our ActiveReports product see the following explanation:
http://www.datadynamics.com/Help/ActiveReports6/arHOWInsertOrAddPages.html
Scott Willeke
GrapeCity - Data Dynamics