loading multiple files with the same xml schema - etl

I've defined a xml-schema in Talend, using an xml file from one provider. I have multiple providers that I need to handle seperately, but they have the same format of xml.
I only want to define the xml schema once, but use it in multiple jobs each with a different file name. The xml schema seems to be tight to a filename however, and changing the filename makes it a build-in type. I don't want a build in type as I want changes to the xml schema to happen once.
Can somebody point me in the right direction? Should this be done using context?

It is possible to define a schema for a set file (using the wizards provided or building it yourself) and then use just that schema by simply choosing it from the repository.
So, as an example, you might wish to loop through a folder full of XML files and read them using the same schema for all of them and then load this into a database:
To do this you would start with a tFileList which points to the folder full of XML files. Set this up as usual (you probably want a filemask on *.xml") and then link it via an Iterate flow to a tFileInputXML component specifying the file name as: ((String)globalMap.get("tFileList_1_CURRENT_FILEPATH")).
Now select Repository from the drop down box next to Schema (should be default as Built-In. From here simply select the XML schema previously defined for a single file. Now you can use just the schema defined but change everything else (you probably only want control over the file name and leave the rest as is).
Now you can simply connect it to a database component of your choice, such as a tMySQLOutput and have the database component insert rows as per usual.

This is very common, but unfortunately there's no an elegant solution.
Context vars are limited to just primitive types (almost), while the only way to do so is to define an xml schema metadata and then switch off to built-in to change just the filename. This is very ugly, but AFAIK is the only solution possible atm.

Related

How to access 12c report metadata?

I am looking for a method (or even better a DW table) that contains report properties such as Name, Description, Type, Location, etc.
I have looked through many tables but can not find this information. I am working to build out a web portal that includes hyperlinks for all reports on the server.
Here is an example of report properties I am looking for-
Unfortunately, the definitions you're looking for are not stored at the database level, which is super lame, but that's the way it is. They're stored in the RPD file and the web catalog at the OS level.
The webcatalog is located:
on 10G: OracleBIData/web/catalog/
on 11G:
$ORACLE_INSTANCE/bifoundation/OracleBIPresentationServicesComponent/catalog/
on 12c: $ORACLE_HOME\user_projects\domains\bi\bidata\service_instances\ssi\metadata\content\catalog where ssi is a service instance.
If you descend into one of those directory structures you'll see files that are named with a bunch of punctuation symbols, plus the name of the report they represent.
Reference 1
Reference 2
Just to clarify the "lame" storage: What the OP is asking for is in the presentation catalog; the RPD has nothing to do with it.
And to clarify even further: Every object stored in the presentation catalog is physically represented by two files on the disk: one file without file extension which represents the object's XML definition. And one file with an .atr extension which contains the object's properties - what the OP is looking for - as well as the object's access permissions.
Ranting's fain, but please be precise ;-)
For what it's worth, in E-Business Suite, tables start with XDO_

How to change Folder structure template for Importing/Comparing a Database?

Is there a way to customize the folder structure for objects which get created in the project when importing a Database, or doing a schema comparison?
Is there a customizable template file used by Visual Studio to perform the action of including generated object in the solution.
Example:
By default, a Table and all its Indexes get created in "Tables" Folder, in a single file.
I would like to split these into separate files. Same goes for Statistics.
Here is an image of Folder Structures:
- Server Object Explorer
- Solution Explorer (what i would like it to look like)
Note:
I know that when doing comparison I can prepare a folder in my solution, then drag the desired objects, but that is not an acceptable solution.
As of today, I don't think there is way to specify folder structures apart from these default import options (Schema, Object type, schema\object type).
Well, you can split them up manually and create them on separate folders but its lot of manual work based on how many objects you have. but the publish/Script generation/Compare are not affected by this move.

Serializing query result

I have a financial system with all its business logic located in the database and i have to code an automated workflow for transactions batch processing, which consists of steps listed below:
A user or an external system inserts some data in a table
Before further processing a snapshot of this data in the form of CSV file with a digital signature has to be made. The CSV snapshot itself and its signature have to be saved in the same input table. Program updates successfully signed rows to make them available for further steps of code
...further steps of code
Obvious trouble is step#2: I don't know, how to assign results of a query as a BLOB, that represents a CSV file, to a variable. It seems like some basic stuff, but I couldn't find it. The CSV format was chosen by users, because it is human-readable. Signing itself can be made with a request to external system, so it's not an issue.
Restrictions:
there is no application server, which could process the data, so i have to do it with plsql
there is no way to save a local file, everything must be done on the fly
I know that normally one would do all the work on the application layer or with some local files, but unfortunately this is not the case.
Any help would be highly appreciated, thanks in advance
I agree with #william-robertson. you just need to create a comma delimited values string (assuming header and data row) and write that to a CLOB. I recommend an "insert" trigger. There are lots of SQL tricks you can do to make that easier). On usage of that CSV string will need to be owned by the part of the application that reads it in and needs to do something with it.
I understand yo stated you need to create a CVS, but see if you could do XML instead. Then you could use DBMS_XMLGEN to generate the necessary snapshot into a database column directly from the query for it.
I do not accept the concept that a CVS is human-readable (actually try it sometime as straight text). What is valid is that Excel displays it in human-readable form. But is should also be able to display the XML as human-readable. Further, if needed the data in it can be directly back-ported into the original columns.
Just a alternate idea.

Building Oracle DB; Good Directory Layout

I'm looking for advice on how to best organize a new Oracle schema and dependent files in my project directory - with the sequences, triggers, DDL, etc. I've been using one monolothic file called schema.sql for some time, but I'm wondering if there's a best practice? Something like...
database/
tables/
person.sql
group.sql
sequences/
person.sequence
group.sequence
triggers/
new_person.trigger
Penny for your thoughts or a URL that I may have missed!
Thank you!
Storing DDL by object type is a reasonable approach-- anything is likely to be easier to navigate than a monolithic SQL script. Personally, though, I'd much rather have DDL organized by function. If you're building an accounting system, for example, you probably have a series of objects to manage accounts payable and a separate set of objects to manage accounts receivable along with some core objects for managing the general ledger accounts. That would lead to something along the lines of
database/
general_ledger/
tables/
packages/
sequences/
accounts_receivable/
tables/
packages/
sequences/
accounts_payable/
tables/
packages/
sequences
As the system gets more complex, that hierarchy would naturally get deeper over time. This sort of approach would more naturally mirror the way non-database code is stored in source control. You wouldn't have a single directory of Java classes in a directory structure like
middle_tier/
java/
Foo.java
Bar.java
You would organize the classes that implement the same sorts of business logic together and separate from the classes that implement different bits of business logic.
One item to consider is those SQLs which can act as 'latest only' scripts. These include CREATE OR REPLACE PROCEDURE/FUNCTION/TRIGGER etc. You run the latest version and you are not worried about what may have previously existed in the database.
On the other hand you have tables where you may start off with a CREATE TABLE followed by several ALTER TABLEs as changes to the schema evolve. And if you are doing an upgrade you may want to apply several of the ALTER TABLE scripts (preferably in order).
I'd argue against a 'functional grouping' unless it is really obvious where the lines are drawn. You probably don't want to be in a position where you have a USERS table in one group and a USER_AUTHORITIES in another and an AUTHORITY group in a third.
If you do have decent separation, then they are probably in separate schemas and you do want to keep schemas distinct (since you can have the same object names in different schemas).
The division-by-object-type arrangement, with the addition of a "schema" directory below the database directory works well for me.
I've worked with source control systems that have the additional division-by-function layer - if there are many objects it adds additional searching if you're trying to cross-reference the source control file with the object that you see in a database GUI navigator that generally groups objects by type. It's also not always clear how an object should be classified this way.
Consider adding a "grants" directory for the grants made by that schema to other schemas or roles, with one file per grantee. If you have "rule-based" grants such as "the APPLICATION_USER role always gets SELECT on all of schema X's tables", then write a PL/SQL anonymous block to perform this action. (You might be tempted to reverse-engineer the grants after they get put in place by some ad-hoc method, but it's easy to miss something when new tables or views are added to the application).
Standardize on a delimiter for all scripts and you'll make your life easier if you start deploying through a build utility such as Ant. Using "/" (vs. ";") works for both SQL statements as well as PL/SQL anonymous blocks.
In our projects we use somewhat combined approach: we have a core of our program as a root and other functionalities in subfolders:
root/
plugins/
auth/
mail/
report/
etc.
In all these folders we have both DDL and DML scripts almost all of them can be run more that once, e.g. all packages are defined as create or replace..., all data insertion scripts check whether data already exists and so on. This gives us the opportunity to rus almost all scripts without thinking that we can crash something.
Obviously this scenario can't be applied for create table and similar statements. For these scripts we have manually written small bash script that extracts specified files and runs them not failing on particular ORA errors, like: ORA-00955: name is already used by an existing object.
Also all files are mixed in the directories but differ with extensions: .seq goes for sequence, .tbl goes for table, .pkg goes for package interface, .bdy goes for package body, .trg goes for trigger an so on...
Also we have a naming convention denoting prefixes for all of our files: we can have cl_oper.tbl table with cl_oper.seq and cl_oper.trg sequence and triggers and cl_oper_processing.pkg together with cl_oper_processing.bdy with logic for mentioned objects. With this naming convention in file managers it's very easy to see all the files connected with some unit of logic for our project (whilst the grouping in directories by object types does not provide this).
Hope this information helps you somehow. Please leave comments if you have any questions.

Implementing user-defined db parameters/properties in Oracle

OK, the question title probably isn't the best, but I'm looking for a good way to implement an extensible set of parameters for Oracle database applications that "stay with" the host/instance. By "stay with", I mean that I'd like to rule out just having an Oracle table of name/value pairs that would have to modified if I create a test/QA instance by cloning the production instance. (For example, imagine a parameter called email_error_address that should be set to prod_support#abc.com in production and qa_support#abc.com in testing).
These parameters need to be accessed from both PL/SQL code running in the database as well as client-side code. I started out doing this by overloading the plsql_cc_flags init parameter (not a solution I'm proud of), but this is getting messy to maintain and parse.
[Edit]
Ideally, the implementation would allow changes to the list without restarting the instance, similar to the dynamically-modifiable init parameters.
You want to have a separate set of values for each environment. You want these values to be independent of the data, so that they don't get overridden if you import data from another instance.
The solution is to use an external table (providing you are on 9i or higher). Because external tables hold the data in an OS file they are independent of the database. To apply changed values all you need to do is overwrite the OS file.
All you need to do is ensure that the files for each environment are kept separate, This is easy enough if Test, QA, Production, etc are on their own servers. If they are on the same server then you will need to distinguish them by file name or directory path; in either case you may need to issue a bit of DDL to correct the location in the event of a database refresh.
The drawback to using external tables is that they can be a bit of a performance overhead - they are really intended for bulk loading. If this is likely to be a problem you could use caching, with a user-defined namespace or CONTEXT. Load the values into memory using DBMS_SESSION.SET_CONTEXT() either on demand on with an ON LOGON trigger. Retrieve the values by wrapper calls to SYS_CONTEXT(). Because the namespace is in session memory retrieval is quite fast. René Nyffenegger has a simple example of working with CONTEXT: check it out.
While I've been writing this up I see you have added a requirement to change things on the fly. As I have said already this is easy with an OS file, but the use of caching makes things sightly more difficult. The solution would be to use a globally accessible CONTEXT. Have a routine which loads all the values at startup which you can also call whenever you refresh the OS file.
You could use environment variables that you can set per oracle user (the account that starts up the Oracle database) or per server. The environment variables can be read with the DBMS_SYSTEM.GET_ENV procedure.
I tend to use a system_parameters table. If your concerned with it being overwritten put it in it's own schema and make a public synonym.
#APC's answer is clever.
You could solve the performance overhead by adding a materialized view on top of the external table(s). You would refresh it after RMAN-cloning, and after each update of the config files.

Resources