Is Informatica Powercenter no longer available as independent product? All I can see on their website is cloud, cloud and cloud.
Related
I am working On Databricks. I have curated data in form of fact and dims. Theses data consume for power bi reporting by synapse. I am not sure what is the use of synapse, If data is already cook in databricks layer. why we are using synapse in this framework.
why we are using synapse in this framework
An analytics service for data warehouses and large data is called Azure Synapse. We can combine Azure services like Power BI, Machine Learning, and others using Azure Synapse.
It offers a number of connectors that make it easier to transfer a sizable volume of data between Azure Databricks and Azure Synapse. It also offers a mechanism for Azure Databricks users to connect to Azure Synapse.
Additionally, Azure Synapse offers SQL pools for the computing environment and data warehousing.
We are going to move from SQL server to Snowflake as our target database for the warehouse.
Today we have most of our ETL development done in ODI (Oracle Data Ingegrator).
So I'm intressted in to know if anyone is using ODI together with Snowflake and how it's woking.
And what experince/recommendations you have of other ETL tools together with Snowflake as target.
For example
Matillion
DBT
Xplenty
Today we have started with using NIFI moving the data from source to Azure blob storage.
But we are not sure if ODI is the right tool for the rest when we are in the cloud.
I'm really looking forward to see all your answers
Snowflake supports both transformations during (ETL) or after loading (ELT).
Snowflake works with a wide range of data integration tools, including Informatica, Talend, Tableau, Matillion and others.
In data engineering, new tools and self-service pipelines are eliminating traditional tasks such as manual ETL coding and data cleaning companies. With easy ETL or ELT options via Snowflake, data engineers can instead spend more time working on critical data strategy and pipeline optimization projects.
With a Snowflake as your data lake and data warehouse, ETL can be effectively eliminated, as no pre-transformations or pre-schemas are needed.
In addition, Snowflake Snowpark is designed to make building complex data pipelines a breeze and to allow developers to interact with Snowflake directly without moving data. Read more about Snowpark here.
https://www.snowflake.com/trending/etl-tools
If you started to transfer data from the source to Azure blob storage, I assume that you have a subscription in Azure and it is possible that Snowflake itself is placed in the Azure region.
In this case, I recommend using Azure Data Factory directly, so you have everything on one provider and support for data migration from SQL Server.
Link to documentation: Copy and transform data in Snowflake using Azure Data Factory
I am looking for scope where i can send data from oracle db to AWS Data exchange without any manual intervention?
In January 2022, AWS Data Exchange launched support for data sets backed by Amazon Redshift; the same guide referenced by John Rotenstein, above, shows you how you can create a data set using Amazon Redshift datashares. If you are able to move data from the Oracle database to Amazon Redshift, this option may work for you.
AWS Data Exchange just announced a preview of data sets using AWS Lake Formation, which allows you to share data from your Lake Formation data lake, which has support for Oracle databases running in Amazon Relational Database Service (RDS) or hosted in Amazon Elastic Compute Cloud (EC2). Steps to create this kind of product can be found here.
We are trying do a proof of concept on Informatica Big Data edition (not the cloud version) and I have seen that we might be able to use HDFS, Hive as source and target. But my question is does Informatica connect to Cloudera Impala? If so, do we need to have any additional connector for that? I have done comprehensive research to check if this is supported but could not find anything. Did anyone already try this? If so, can you specify the steps and link to any documentation?
Informatica version: 9.6.1 (Hotfix 2)
You can use the odbc driver provided by cloudera.
http://www.cloudera.com/downloads/connectors/impala/odbc/2-5-22.html
For Irene, the you can use the same driver the above one is based the simba driver.
http://www.simba.com/drivers/hbase-odbc-jdbc/
I have a option of using Sqoop or Informatica Big Data edition to source data into HDFS. The source systems are Tearadata, Oracle.
I would like to know which one is better and any reason behind the same.
Note:
My current utility is able to pull data using sqoop into HDFS , Create Hive staging table and archive external table.
Informatica is the ETL tool used in the organization.
Regards
Sanjeeb
Sqoop
Sqoop is capable of performing full and incremental loading from Oracle/Teradata.
Sqoop does parallel copy of data from source systems.
Sqoop scripts can be custom genrated and scheduled by Oozie.
Open source solution for any size cluster. No license cost.
Informatica
Best Interface in ETL Industry to manage mappings.
Does not provide parallel copy options. Provides Hive mode for parallel processing. Basically converts transformation into Hive queries for execution. Also supports push downs to generate MR code.
Licensing cost per node. If you plan 500 Hadoop nodes for future data storage you need to pay 10 times as compared with 50 node cluster when you scale cluster.
Informatica BDE is relatively new product in market. INFA Developer will be usefull for working on Big data. There are challenges in supporting all latest Hadoop platform features on Informatica, also traditional RDBMS features like Sequence generation, Stateful mapping,Sessions, Lookup Transformation in Informatica BDE.
Informatica MDM does not support Hadoop.
If price is criteria for decision making, go for Sqoop. If you want to leverage flexibility of switching Hadoop plaftorm tools, use Sqoop(Sqoop project is also thinking of moving over Spark).
If you are tied to Informatica for some reason, go for Informatica. But most Informatica developers want to move to Hadoop technologies.
Although this was asked an year ago, sharing new features in Informatica
Informatica BDM version 10.1 supports Sqoop connectivity i.e. you can use Sqoop to read the data from RDBMS and load it into Hadoop/Hive
Also, there are many new features in BDM version 10.2, especially the parameterization support in the developer tool and dynamic mappings.
Tool versus handcoding was always there.
Informatica tool gives enterprise level solution which is easier to maintain.
BDM 10.1.1 supports sqoop with spark engine. Spark 2.0.1 is supported in this version so performance its pretty good.
BDM 10.2 is just released with new features like stateful variable support which was missing in earlier versions.
SQOOP must be used for the Data exchange. You have lot of options with which you can have an optimal performance. Also if you are trying to exchange the data between RDBMS(Teradata / Oracle) <-> Informatica <-> Hadoop cluster then the data would first need to be brought to the Informatica Server which may involve additional I/O.
If the data processing must be done within hive Informatica BDE must be used.