In the icCube ETL, there is a data manipulation (data view) called "File dump". I have set-up a couple of them in the ETL process, but none are executed when the data is loaded into icCube.
This is a simple version of what I do:
data source 1 > data view: a > used in FACTS
data source 1 > data view: a > data view: file dump
The file dump is not executed, as I do not see a file on the server.
How to achieve that during load, there is alway a file dump available?
You should ensure the view is actually used in either building the dimensions or the facts. This is a pass through view so it can be used like the underlying the table with no problem.
So in your example "data view: file dump" should be the 'table' feeding the facts.
Hope that helps.
Related
Create a stored procedure that will read the .csv file from oracle server path using read file operation, query the data in some X table and write the output in .csv file.
here after read .csv file, compare .csv file data with table data and need to update few columns in .csv file.
Oracle works best with data in the database. UPDATE is one of the most frequently used commands.
But, modifying a file which resides in some directory seems to be somewhat out of scope. There are other programming languages you should use, I believe. However, if a hammer is the only tool you have, every problem looks like a nail.
I can think of two options.
One is to load file into the database. Use SQL*Loader to do that if file resides on your PC, or - if you have access to the database server and DBA granted you read/write privileges on a directory (an Oracle object which points to a filesystem directory) - use it as an external table. Once you load data, modify it and export it back (i.e. create a new CSV file) using spool.
Another option is to use UTL_FILE package. It also requires access to the database server's directory. Using the A(ppend) option, you can add rows to the original file, but I don't think that you can edit it so this option - at the end - finishes like the previous one - with creating a new file (but this time using UTL_FILE).
Conclusion? Don't use a database management system to modify files. Use another tool.
I have some question regarding the effective way of reading values in DB and generating report.
I use hadoop to see data from multiple tables and do data analysis based on the results.
I want to know if there is effective tool or way which can read data from multiple tables and evaluate if the values of certain columns are same across tables and send report if they are not same... I have 2 options, either I can read data from hadoop or I can connect to DB in DB2 and do it. Without creating a new java program, is there a tool which helps for the same? Like Talend tool which reads XML and writes output in DB ?
You can use Talend for this. Using Talend, you can read data from Hadoop as well as from database. In between you can perform your operation after fetching data and generate report.
if your using alot of data, and do this sort of function alot elasticsearch is also a great help in this area. use ELK stack. although you would not need the 'L' logstash part of this necessarily
I was wondering, for folks familiar with DataStage, if Oracle SQLLDR can be used on DataStage. I have some sets of control files that I would like to incorporate into DataStage. A step by step way of accomplishing this will greatly be appreciated. Thanks
My guess is that you can run it with external stage in data stage.
You simply put the SQLLDR command in the external stage and it will be executed.
Try it and tell me what happens.
We can use ORACLE SQL Loader in DataStage .
If you check Oracle Docs there are two types of fast loading under SQL Loader
1) Direct Path Load - less validation in database side
2) Conventional Path Load
There is less validation in Direct Load if we compare to Conventional Load.
In SQL Loader process we have to specify points like
Direct or not
Parallel or not
Constraint and Index options
Control and Discard or Log files
In DataStage , we have Oracle Enterprise and Oracle Connector Stages
Oracle Enterprise -
we have load option in this stage to load data in fast mode and we can set Environment variable OPTIONS
for Oracle , example is below
OPTIONS(DIRECT=FALSE,PARALLEL=TRUE)
Oracle Connector -
We have bulk load option for it and other properties related to SQL Loader are available in properties tab .
Example - control and discard file values all set by DataStage but you can set these properties and others manually.
As you know SQLLDR basically loads data from files to database so datastage allows you to use any input data file, that would take input in any data file like sequential file, pass them format, pass the schema of the table, and it’ll create an in memory template table, then you can use a database connecter like odbc or db2 etc. and that would load your data in your table, simple as that.
NOTE: if your table does not exist already at the backend then for first execution make it create then set it to append or truncate.
Steps:
Read the data from the file(Sequential File Stage)
Load it using the Oracle Connector(You could use Bulk load so that you could used direct load method using the SQL loader and the data file and control file settings can be configured manually). Bulk Load Operation: It receives records from the input link and passes them to Oracle database which formats them into blocks and appends the blocks to the target table as opposed to storing them in the available free space in the existing blocks.
You could refer the IBM documentation for more details.
Remember, there might be some restriction in loading when it comes to handling rejects, triggers or constraints when you use bulk load. It all depends on your requirement.
I have multiple excel files with two types of metadata, Now i have to push the data into two different tables based on metadata of excel files using SSIS.
There are many, many different ways to do this. You'd need to share a lot more information on how your data is structured to really give a great answer, but here's the general strategy I'd suggest.
In the control flow tab, have a separate data flow for each Excel file. The data flows will all work the same, with the exception of having a different Excel source in each data flow, so it will be enough to get the first version working and then copy and paste for the other files.
In the data flow, use a conditional split transformation to read the metadata coming from Excel and send the row to the correct table.
If you really want to be fancy, however, you could create a child package that includes all your data flow logic. Using the Execute Package Task you can pass the Excel file name to the child package for each Excel file you need to import. This way you consolidate your logic in one package and can still import from multiple Excel files in parallel.
I'm trying to import data from a csv file which, unfortunately, contains multiple data tables. Actually, it's not really a pure csv file.
It contains a header field with some metadata and then the actual csv data parts are separated by:
//-------------
Table <table_nr>;;;;
An example file looks as follows:
Summary;;
Reporting Date;29/05/2013;12:36:18
Report Name;xyz
Reporting Period From;20/05/2013;00:00:00
Reporting Period To;26/05/2013;23:59:59
//-------------
Table 1;;;;
header1;header2;header3;header4;header5
string_aw;0;0;0;0
string_ax;1;1;1;0
string_ay;1;2;0;1
string_az;0;0;0;0
TOTAL;2;3;1;1
//-------------
Table 2;;;
header1;header2;header3;header4
string_bv;2;2;2
string_bw;3;2;3
string_bx;1;1;1
string_by;1;1;1
string_bz;0;0;0
What would be the best way to process load such data using kettle?
Is there a way to split this file into the header and csv data parts and then process each of them as separate inputs?
Thanks in advance for any hints and tips.
Best,
Haes.
I don't think there are any steps that will really help you with data in such a format. You probably need to do some preprocessing before bringing your data into a CSV step. You could still do this in your job, though, by calling out to the shell and executing a command there first, like maybe an awk script to split up the file into its component files and then load those files via the normal Kettle pattern.