Who will add rows in fact table of Mondrian? Developer or Mondrian itself? - business-intelligence

I want to ask very very basic question related to Mondrian.
I have created one fact table to build Mondrian cube. Currently that fact table does not contain any rows. So, I would like to know Who will add rows in fact table of Mondrian? Developer or Mondrian itself?

The developer.
Mondrian is, roughly speaking, simply an engine that takes MDX queries and translates them into SQL queries.
More to the point, typically you'll have a database that serves as data warehouse (where you have your Mondrian cubes) and an operational database (or several), where the actual data is coming from. Though you declared the cube in a cubename.mondrian.xml file, you have given no indications to Mondrian on what the operational database looks like (it might not even look like a database -- we maintain several cubes populated from Apache logs!)
Since it's your responsibility as the developer to populate the cube, in the Pentaho world we usually use Pentaho Data Integration (also known as Kettle) as our ETL tool (which is to say, it's the tool we use to Extract data from whatever sources, Transform it into a shape more useful for our purposes, and Load it into the data warehouse)

Related

How to do real time data ingestion from Transactional tables to a Flat table

We have transaction tables in Oracle and for reporting purposes we need this data transfered in real time to another flat Oracle table in another database. The performance of the report is great with table placed in this flat table.
Currently we are using golden gate for replication to the other database and using materialized view for this but due to some problems we need to switch to some other way of populating/maintaining this flat table. What options do we have?
It is a pretty basic requirement but the solutions I can see are for batch processing. Also if there are any other solutions you feel would better serve this purpose. Changing the target database to something other is also an option as there might be more such reports coming ahead.

How can we do data analysis for DB replication project

We are facing one issue in our project i.e. Data verification issue.
The project is about Replication of data from Sybase to oracle DBs.
The table structures for Table A across Sybase, Oracle is same.
Same column and primary key combination across all the databases.
e.g. If Sybase has Table A with columns a, b and C
same table with same name and same columns will be available in different databses.
We are done with replication stuff part.But we faced some silent failure like data discrepancy just wondering if there will any tool already available for this.
Any information on his would be helpful. Thanks.
Sybase (now SAP) has a couple products that can be used for data comparisons and reconciliation:
rs_subcmp - an older, 32-bit tool that comes with the Sybase Replication Server product that can be used to compare data between
source and target; SQL reconciliation scripts can be generated from
the differences and then applied to the target to bring it in sync
with the source; if your tables are more than 1GB in size you can
still use rs_subcmp but you'll need to create multiple comparison
jobs (via where clauses) to work on different subsets of your tables
[I don't recall if rs_subcmp can be use for heterogeneous
replication setsup, eg, ASE-Oracle.]
Data Assurance (DA) - the newer, 64-bit product ... also from
Sybase ... which can also compare data and (re)sync the target(s)
from the source (either via SQL reconciliation scripts or directly);
DA is capable of handling comparisons between a handful of
different RDBMS products (eg, ASE-Oracle); I'm currently working on a
project where one of the requirements is to validate (and reconcile
where needed) 200+TB of data being migrated from Oracle to HANA and
I'm using DA for the validation/reconciliation portion of the project
As #TenG has hinted at with his answer, there's a good bit of effort involved to compare data and generate code to reconcile the differences. Rolling your own code is doable but will entail a lot of work. If you've got the money you'll likely find 3rd party tools can get most/all of the work done for you.
If you used a 3rd party product to replicate your data from Sybase to Oracle, you may want to see if the same vendor has a comparison/validation/reconciliation tool you could use.
I've worked on a few migration projects and a key part has always been data reconciliation.
I can only talk about the approaches we took, based on constraints around tools available and minimising downtime, and constraints of available space.
In all cases I took to writing scripts that worked on two levels - summary view and "deep dive". We couldn't find any tools readily available that did what we wanted in a timely enough manner. In fact even the migration tools we found had limitations (datapump, sqlloader, golden gate, etc) and hand coded scripts to handle the bits that we found to be lacking or too slow in the standard tools.
The summary view varied from project to project. It was part functional based (do the accounting figures for transactions match) for the users to verify, and part technical. For smaller tables we could just write simple reports and the diff was straight forward.
For larger tables we wrote technical reports that looked at bands of data (e.g group the PK into 1000s) collect all the column data and produce checksum, generating a report for each table like:
PK ID Range Start Checksum
----------------- -----------
100000 22773377829
200000 38938938282
.
.
Corresponding table pairs from each database were then were "diff"d against each other to highlight discrepancies. Any differences that were found could then be looked at in more detail.
The scripts were written in such a way to allow them to run in parallel looking at discrete bands. Te band ranges were tunable as well to get the best throughput. This obviously sped things up.
The scripts were shell scripts firing off sqlplus reports, and similar for the source database.
On one project there wasn't enough diskspace to do these reports, so I wrote a Java program that queried the two databases side by side, using block queues to fetch and compare rowsets. Being in memory meant this was super fast.
For the "deep dive" we looked at the details for key tables, or for tables that reports a checksum difference.
For the user reports, the users would specify what they wanted to see, and we wrote the reports accordingly.
On the last project, the only discrepancies found were caused by character set conversion issues (people names with accents weren't handled correctly).
On projects where the overall dataset was smaller we extracted the data to XML files and wrote a Java tool to processes pairs and report differences.
The SAP/Sybase rs_subcmp tool is pretty powerful and also pretty hard to use. For details see:
https://help.sap.com/viewer/075940003f1549159206fcc89d020515/16.0.3.3/en-US/feb58db1bd1c1014b134ef4efef25563.html?q=rs_subcmp
You have to pass it key field information, but once you do that, it can retry/restart the compare streams after transient differences. Pretty fancy.
rs_subcmp expects to work on Sybase data source. So to compare against Oracle, you'd probably have to setup one of those Sybase-to-Oracle gateway products ($$$$$).
Could you install the Oracle ODBC drivers and configure them to allow Sybase clients to access Oracle? I'm guessing not (but that's outside the range of my experience).
Note the "-h" option for rs_subcmp. The docs just say it runs a "fast comparison", but what it's actually doing is running queries using the hashbytes() function. Something like:
select keyfield1,keyfield2, hashbytes("Md5",datacol1,datacol2,datacol3)
from mytable
So this sort of query might be good for the "summary view" type comparison discussed above (if the Oracle STANDARD_HASH() function output matches up with the Sybase hashbytes() function (again, outside my experience))
Note, as of ASE 16, there was a bug with the hash() & hashbytes() functions running the Md5 hash option against large varbinary columns where they could use up all procedure cache, potentially crashing the server (CR 811073)

Seeking Opinion:Denormalising Fact and Dim tables to improve performance of SSRS Reports

We seem to have bit of a debate on a discussion point in our team.
We are working on a Data Warehouse in the Microsoft SQL Server 2012 platform. We have followed the Kimball Architecture to build this Data Warehouse.
Issue:
A reporting solution (built on SSRS), which sources data from this Warehouse, has significant performance issues when sourcing data from fact and dim tables. Some of our team members suggest that we extract data from facts and dims into a new set of tables using SSIS packages. This would mean we denormalise these tables into ‘Snapshot’ tables. In this way the we would not need to join these tables to create data sets within the reports. Data could be read out of these tables directly.
I do have my own worries about this; inconsistencies, maintenance of different data structures, duplication of data etc to name a few.
Question:
Would you consider creating snapshot tables (by denormalising facts and dim tables) for reporting tables a right approach?
Would like to hear your thoughts on this.
Cheers
Nithin
I don't think there is anything wrong with snapshot tables. The two most important aspects of a data warehouse are:
The data is correct.
The data is useful.
If your users are unable to extract the totals they require, in a reasonable timescale, they won't use the warehouse.
My own solution includes 3 snapshot tables. Like you, I was worried about inconsistencies. To address this we built an automated checking process. This sub-system executes a series of queries, stored on a network drive, once an hour. Any records returned by the queries are considered a fail. Fails are reported and immediately investigated by my ETL team. This sub-system ensures the snapshots and underlying facts are always aligned and consistent with each other. Drift is prevented.
That said, additional tables equals additional complexity. And that requires more time/effort to manage. Before introducing another layer to your warehouse, you should investigate why these queries are underperforming. If joins are to blame:
Are you using an inappropriate data type, for your P/F keys?
Are the FKeys indexed (some RDBMS do this by default, others do not)?
Have you looked at the execution plans, for the offending queries?
Is the join really to blame, or is it a filter applied to the dim table?
for raw cube performance my advice would be to always try to denormalize your tables and have one fact table and one table for each dimension (star schema).
If you are unsure if it will actually help you could start creating materialized views. These are kind of the best of both worlds, on the long run you should alter your etl.
In my previous job we only had flattened tables which worked quite well. Currenly we have a normalized schema but flatten it in the last step.

Move data from Oracle to Cassandra and/or MongoDB

At work we are thinking to move from Oracle to a NoSQL database, so I have to make some test on Cassandra and MongoDB. I have to move a lot of tables to the NoSQL database the idea is to have the data synchronized between this two platforms.
So I create a simple procedure that make selects into the Oracle DB and insert into mongo. Some of my colleagues point that maybe there is an easier(and more professional) way to do it.
Anybody had this problem before? how do you solve it?
If your goal is to copy your existing structure from Oracle to a NoSQL database then you should probably reconsider your move in the first place. By doing that you are losing any of the benefits one sees from going to a non-relational data store.
A good first step would be to take a long look at your existing structure and determine how it can be modified to affect positive impact on your application. Additionally, consider a hybrid system at the same time. Cassandra is great for a lot of things, but if you need a relational system and already are using a lot of Oracle functionality, it likely makes sense for most of your database to stay in Oracle, while moving the pieces that require frequent writes and would benefit from a different structure to Mongo or Cassandra.
Once you've made the decisions about your structure, I would suggest writing scripts/programs/add a module to your existing app, to write the data in the new format to the new data store. That will give you the most fine-grained control over every step in the process, which in a large system-wide architectural change, I would want to have.
You can also consider using components of Hadoop ecosystem to perform this kind of (ETL) task .For that you need to model your Cassandra DB as per the requirements.
Steps could be to migrate your oracle table data to HDFS (using SQOOP preferably) and then writing Map-Reduce job to transform this data and insert into Cassandra Data Model .

Migrating data between 2 databases in Oracle 9i

I am new to Oracle. Since we have rewritten an earlier application , we have to migrate the data from the earlier database in Oracle 9i to a new database , also in 9i, with totally different structures. The column names and types would be totally different. We need to map the tables and columns , try to export as much data as possible, eliminate duplicates, and fill empty values with defaults.
Are there any tools which can help in mapping the elements of the 2 databases , with rules to handle duplicates, and default values and migrate the data ?
Thanks,
Chak.
If your goal is to migrate data between two very different schemas you will probably need an ETL solution (ETL=Extract Transform Load).
An ETL will allow you to:
Select data from your source database(s) [Extract]
apply business logic to the selected data [Transform] (deal with duplicates, default values, map source tables/columns with destination tables/columns...)
insert the data into the new database [Load]
Most ETLs also allow some kind of automatisation and reporting of the loads (bad/discarded rows...)
Oracle's ETL is called Oracle Warehouse Builder (OWB). It is included in the Database licence and you can download it from the Oracle website. As most Oracle products it is powerful but the learning curve is a bit steep.
You may want to look into the [ETL] section here in SO, among others:
What ETL tool do you use?
ETL tools… what do they do exactly? In laymans terms please.
In many cases, creating a database link and some scripts a'la
insert into newtable select distinct foo, bar, 'defaultvalue' from oldtable#olddatabase where xxx
should do the trick

Resources