I am new to the reporting world. Wanted to know which is the right solution to about for generating a single report by querying for data from multiple databases. We are planning to use some reporting solution like Jasper Reports or BIRT. Generally the databases are going to be postgresql.
Please do feel free to let me know about any other better solutions as well.
Thanks.
With BIRT you can use as many data sources as you like; independently or together as joint data sets. A Joint Data Set is basically a join you create at the report level. The cool think here is that you can in effect create the join accross data bases, even instances.
All the expected sources are supported, even some not so expceted ones. Any JDBC DB, Web Servce, Flat File, POJOs (via a Scripted Data Source), XML, Native DB Driver (i.e. Oracle, SQL Server, etc...). You can even use on BIRT Report as a data source for a secondary BIRT report. That gets a bit beyond the scope of the question, but opens up a huge amount of options in terms of deployment and felxibility.
In JasperReports if you are generating the report on the server, and streaming it back to the client as PDF or HTML, then you can use any datasources you want, being it:
Multiple databases
Objects, i.e. Java Beans
Text files/XML
Web services
... etc
Related
I have around 30000 sheets of excel with standardized report.
The goal is to provide online dashboard to view the data.
My first thought as programmer is to create a database and find a way to import the data and then build front end on the top to create a customized dashboard.
Any easier way to do this?
If your reports are always going to be in excel, I don't think it would be a good approach to convert them to a database. It will only add an overhead of converting the data and creating the frontend for it, that would be an overengineering.
Instead what you can do is you can use data visualization tools like retool which already offer visualization components on top of any source data. The source can be anything be it an excel sheet or a database.
There are other alternatives as well like redash which also offers connections to CSV files.
However, if you still feel that creating a database like MySQL or any other database of your choice is a better approach to go. Then you should consider using Metabase or superset. These are excellent tools for data visualization and only require the database credentials to connect to the DB. These are feature-rich, accepted within the industry and you can create possibly all the visualizations as per your need.
Also, the best part is all the tools that I have mentioned here are totally open source or have a nice trial period.
FYI, For a better and more effective solution, I would recommend running a SQL Server on your machine and uploading the excel report to the SQL server and then connecting the SQL server instance to the Metabase, That would do everything that you need and you don't have to spend time to write any code to manage all these stuff. Below is a glimpse of a Metabase Visualisation Dashboard:
We are working in a very big organization, many Databases (of many types), many schemas, many users.
Does LB has to work with some Source Control (for locking the files
when many users exist in the organization and using the same DB,
same Schema, etc)?
What is the best practice of working with LB in a very big
organization, many concurrent users?
Can SQLCL general sql format type or just xml format type?
Is there some integration with SQL Developer? I mean, suppose a user
changes an objects via sql developer, what happens then?
We get this type of question all the time, after folks get a handle of how to automate DB changes, next step is typically to add it into an existing CI/CD workflow.
Yes, Liquibase works with any source control. Most users are using
Git. But you can use Git, TFS, SVN, CVS... Once you are up and
running with Liquibase, you just need to make sure that your scripts
are in source control and you are good to go.
Besides 3rd party source control tools, Liquibase has tracking tables called "DATABASECHANGELOG" tables that keep track of the changes applied to your database when using Liquibase deployments.
Here is some more information about getting started and How Liquibase Works. https://www.liquibase.org/get_started/how-lb-works.html
Liquibase has one more table that it uses internally called "DATABASECHANGELOGLOCK" table.
This table was designed to prevent multiple Liquibase users running deployments concurrently - potentially leaving the Database in a bad state. Once the Liquibase deployment (the liquibase update command) is done, the "DATABASECHANGELOGLOCK" will allow the next Liquibase user to deploy.
You can use both SQL and XML formats (or even JSON and YAML formats).
When using SQL, you have a few options:
Best option is to use Formatted SQL changeLogs https://www.liquibase.org/documentation/sql_format.html
https://www.liquibase.org/get_started/quickstart_sql.html
You can use plain raw SQL files referenced from an XML changeLog
https://www.liquibase.org/documentation/changes/sql_file.html
When using XML, can find all the available change types (also called changeSets) available in the following page (on the left of the page)
https://www.liquibase.org/documentation/changes/
XML changeLog are more agnostic and sometimes can be used for different Database platforms when doing migrations. Also, many of the change types in XML have the ability to be rolled back automatically. The reason that this is possible with XML is because Liquibase uses it own built in functions to figure out inverse statements like "create table" to be "drop table".
For each of those changeSets you can find out if they are auto rollback eligible (at the bottom of the page). For example, create table changeSet will be Auto Rollback = yes.
https://www.liquibase.org/documentation/changes/create_table.html
Is it possible to use the saiku-ui component with a different jolap provider than mondrian, or with a different server backend than the saiku-server component?
I have been looking but I have not found an architecture description of how these pieces fit together and what interfaces they use to communicate. Can anyone point me towards an understanding of what the saiku-ui wants to speak with and what the saiku-server is providing?
The reason for my interest is that I have a set of data spread across hundreds of csv files that I would like to query with a pivot and charting tool. It looks like the standard way to use this with saiku would be to have an ETL process to load in to an RDBMS. However, this would not be a simple process because the files and content and the way the files relate to each other vary, so the ETL would have to do a lot of inspection of the data sources to figure it out.
Given this it seems to me that I would have three options in how to use saiku:
1) write a complex ETL to load in to a rdbms, and then use a standard jdbc driver to provide the data to modrian. A side function of the ETL would be to analyze the inputs and write the mondrian schema file describing the cubes.
2) write a jdbc driver to access the data natively. This driver would parse sql and provide access to the underlying tables. Essentially this would be a custom r/o dbms written on top of the csv files. The jdbc connection would be used by mondrian to access the data. A side function of this custom dbms would be to produce the mondrian schema file.
3) write a tool that provides a jolap interface to the native data (accepts discovery and mdx queries). This would bypass mondrian entirely and interface with the ui.
I may be a bit naive here but I consider each of the three options to be feasible. Option #1 is my least preferred because of the likelihood of the data in the rdbms becoming out of sync with the cvs files. Option #3 is most preferred because the data are simple, so not much aggregating required and I suspect that mdx will be easier to parse than sql.
So, if i could produce my own jolap data source would it be possible to hook the saiku-ui tools up to it? Where would I look to find out the interface configuration details?
Many years ago, #ronaldbouman created xmondrian - the set of tools with the olap server, and web ui tools for xmla browsing and visualusation. But that project was not updating, and has no source code.
I just updated olap server and libraries to the latest versions.
You may get it here and build:.
https://github.com/Muritiku/xmondrian-build.
You may use web package as the example. The mondrian server works with the saiku-ui.
IMHO,
I would not be as confident as your are, because it took Julian Hyde more than a decade to build Mondrian (MDX->SQL) and Calcite (SQL), fulfilling your last two proposals.
You might simply consider using Calcite, or even better Dremio. Dremio has a JDBC interface, and can query directories of CSV files in SQL. I tested Saiku over Dremio successfully (with a schema based on two separate RDBMS). Just be careful to setup tables' schema accordingly in the Mondrian v4 schema.
Best regards,
Fabrice Etanchaud
Dremio
I am new to ETL migration. I have worked with Talend, but not yet faced the task to migrate large ETL project from one tool to another(IBM Data Manager to Informatica PowerCenter or Informatica Developer).
I am looking to general guidlines for migrate jobs from one tool to another one, and of course for my specific case.
I will be more clear:
The Databases Sources and Targes will be the same, what I have to migrate is the ETL part itself.
The approach will be the parallel run as suggested at this blog :
Parallel Run
In my case I have not to migrate the all DWH instead only the ETL as the old software will become a legacy one and the new one is from another Vendor(luckly both of them can export XML ).
I am looking for the pratical approch for parallel run, indeed I am been suggested to copy the Sources and Targes Tables in the orginal Database schema, but it does not look to me the best way to go(even not pratical when a schema has many tables).
The DWH I am woking of course has several DBS instance in Oracle and some in SQL Server, a Test server and a Production one, as well as for each, a Staging, Storage and a Data Mart area.
As from this related question and its answer, I am thinking to copy each schema on the go for each project.
Staging in ETL: Best Practices
Looking to have guidlines references, but my specific case is the migration from IBM Data Manager to Informatica PowerCenter
The approach depends on various criteria and personal preferences. Either way you will need to either duplicate parts or all of the source and destination systems. On one extreme you can use two instances of the entire system. If you have complex upstream processes that are part of the test, or you have massive numbers of tables and processes, and you have the bandwidth and resources to duplicate your system then this approach may be optimal.
At the other extreme, if any complex processes occur within the ETL tool itself, or you are simply loading tables and need to check they are loaded correctly, then making copies of the tables and pointing your new or old tool to the table copies may be the way to go. This method is very simple and easy to setup.
Keep in mind this forum is not meant to replace blogs and in-depth tech articles on those techniques.
We are developing a large data migration from Oracle DB (12c) to another system with SSIS. The developers are using a production copy database but the problem is that, due to the complexity of the data transformation, we have to do things in stages by preprocessing data into intermediate helper tables which are then used further downstream. The problem is that all developers are using the same database and screw each other up by running things simultaneously. Does Oracle DB offer anything in terms of developer sandboxing? We could build a mechanism to handle this (e.g. have dev ID in the helper tables, then query views that map to the dev), but I'd much rather use built-in functionality. Could I use Oracle Multitenant for this?
We ended up producing a master subset database of select schemas/tables through some fairly elaborate PL/SQL, then made several copies of this master schema so each dev has his/her own sandbox (as Alex suggested). We could have used Oracle Data Masking and Subsetting but it's too expensive. Another option for creating the subset database wouldn have been to use Jailer. I should note that we didn't have a need to mask any sensitive data.
Note. I would think this a fairly common problem so if new tools and solutions arise, please post them here as answers.