I am a newbie to TalendETL and am using Talend Open Studio for Big Data version 6.2.
I have developed a simple Talend ETL job that picks up data from a tFileInputExcel and tOracleInput(dimension date ) and inserts data into my local Oracle Database.
Below is how my package looks :
this job run but i get 0 rows insert into my local Oracle Database
Your picture shows that no rows come out your tMap Component. Verify that your links inside the Tmap are corrects.
Seems there is no data that matches between fgf.LIBELLE_MOIS and row2.B.
Related
I made a job design which consists of tFileInputDelimited -> tMap -> tDBOutput(Oracle)
The csv I am using has columns which are not currently in the table which I don't think should be a problem.. but when I run my job I get multiple ORA-00904 invalid identifier errors.
I check my DB in Oracle SQL developer and no rows have been updated.
Looking for some help how to fix this.. I looked up the error and I get referenced to a SQL code but I am not using SQL only a CSV file to upload.
Thank you!
You say that your csv has columns that are not in your table. That is a problem if you map those columns to the tMap output. Only those columns which are present in your target table need to be in the tMap output flow going to tDBOutput.
I come across a scenario in my project.I am loading data from file to Table using ODI.I am running My interfaces through loadplan.I've 1000 Records in my source file,and also getting 1000 records in target file.but when I'm checking ODI loadplan execution log its showing number of insert is 2000.can anyone please help.or is it a ODI bug.?
The number of inserts does not only show the inserts in the target table but also all the insert happening in temporary tables. Depending on the knowledge modules (KMs) used in an interface, ODI might load data in a C$_ table (LKM) or I$_ table (IKM/CKM). The rows loaded in these table will also be counted.
You can look at the code generated in the operator to check if your KMs are using using these temporary. You can also simulate an execution to see the code generated.
I am new to hadoop and big data, just trying to figure out the possibilities to move my Data store to hbase these days, and I have come across a problem, which some of you might be able to help me with. So its like,
I have a hbase table "hbase_testTable" with Column Family : "ColFam1". I have set the version of "ColFam1" to 10, as I have to maintain history upto 10 updates to this column family. Which works fine. When I try to add new rows through hbase shell with explicit timestamp value it works fine. Basically I want to use the timestamp as my version control. So I specify the time stamp as
put 'hbase_testTable' '1001','ColFam1:q1', '1000$', 3
where '3' is my version. And everything works fine.
Now I am trying to integrate with HIVE external table, and I have all mappings well set to match that of hbase table like below :
create external table testtable (id string, q1 string, q2 string, q3 string)
STOREd BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH
SERDEPROPERTIES ("hbase.columns.mapping" = ":key,colfam1:q1, colfam1:q2, colfam1:q3")
TBLPROPERTIES("hbase.table.name" = "testtable", "transactional" = "true");
And works fine with normal insertion. It updates the HBase table and vice-versa.
Even though the external table is made "Transactional", I am not able to update the data on HIVE. It gives me an error :
FAILED: SemanticException [Error 10294]: Attempt to do update or delete
using transaction manager that does not support these operations
Said that, Any updates, made to the hbase tables are reflected immediately on the hive table.
I can update the Hbase table with hive external table by trying to insert into the hive external table for the "rowid" with new data for the column.
Is it possible to I control the timestamp being written to the referenced hbase table ( like 4,5,6,7..etc) Please help.
The timestamp is one of important element in Hbase versioning. You are trying to create your own timestamp, which works fine at Hbase level.
One point, is you should be very careful, with unique and non-negative. You can look at Custom versioning in HBase-Definitve Guide book.
Now you have Hive on top of Hbase. As per documentation,
there is currently no way to access the HBase timestamp attribute, and queries always access data with the latest timestamp.
Thats for the reading part. And for putting data, you can look here.
It still says that, you have to give valid time stamp and not any other value.
The future versions are expected to expose the timestamp attribute.
I hope you got a better idea regarding how to deal with custom timestamp in Hive-Hbase integration.
I need to do an Oracle data migration from 11g to 12c where schema changes are abundant. I have an excel sheet which describes all the schema changes. Excel sheet has the columns for 'old_table_name', 'old_column_name', 'old_value' and same for the new tables. Some values can be directly copied to the new table and some cannot be done that way.
For example I have to transform the old column value when it is moved to the new table. Some transformation are complex and they cannot be simply mapped. They should be transformed by joining with other tables in the old database. I was trying the Talend Open Studio Data Integration tool for this and found it is a bit complex to go ahead with that tool in my case. Does anyone have an idea of getting this done using Talend or any other tool? What is the ideal approach when doing a migration like this? I have included a sample of the excel sheet below which only has simple transformations.
The kind of converions shown in the spreadsheet can all be performed on the table itself using rename statements and/or basic ddl and dml statements. I would load the old table into the new database and perform these statement on the table.
alter table
old_table_one
rename to
new_table_one;
alter table
new_table_one
rename column
old_col_one
to
new_col_one;
update new_table_one
set new_col_one = 'A_NEW'
where new_col_one = 'A';
etc.
Hey EXPERIENCED SSIS DEVELOPERS, I need your help.
High-Level Requirements
Query SQL Server table (on a different server than my SSIS server) resulting in about 200-300k records results set.
Use three output colums for each row to lookup date in Oracle database.
Insert or Update SQL Server table with results.
Use SSIS.
SQL Server 2008
Sounds easy, right?
Here is what I have done:
Created on Control Flow Execute SQL Task that gets a recordset from SQL Server. Very fast, easy query, like select field1, field2, field 3 from table where condition > 0. That's it. Takes less than a second.
Created a variable (evaluated as expression) for the Oracle query that uses the results set from the above in the WHERE clause.
Created a ForEachLoop Container that takes the results (from #1 above) for each row in the recordset and runs it through a Data Flow that uses the Oracle query (from #2 above) with Data access mode: SQL command from variable against an Oracle data source. Fast, simple query with only about 6 columns returned.
Data Conversion - obvious reasons - changing 3 columns from Oracle data types to SQL Server data types.
OLE DB Destination to insert to SQL Server using Fast Load to staging table.
It works perfectly! Hooray! Bad news - it is very, very slow. When I say slow, I mean it process 3000 records per hour. Holy moly - so freaking slow.
Question: am I missing a way to speed it up? It seems like the ForEachLoop Container is the bottleneck. Growl.
Important Points:
- I have NO write access in Oracle environment, so don't even suggest a potential solution that requires it. Not a possibility. At all.
Oracle sources do not allow for direct parameter definition. So no SELECT FIELD FROM TABLE WHERE ?. Don't suggest it - doesn't work.
Ideas
- Should I find a way to break down the results of the Execute SQL task and send them through several ForEachLoop Containers for faster processing?
Is there another design that is more appropriate?
Is there a script I can use that is faster?
Would it be faster to create a temporary table in memory and populate it - then use the results to bulk insert to SQL Server? Does this work when using an Oracle data source?
ANY OTHER IDEAS?