Is this possible to connect to azure synapse analytics from aws ec2 (which is in vpc) instance? - amazon-ec2

I have below existing architecture hosted in aws env
There is a FDMEE tool configured in ec2 which load data from HFM(outside aws) and dump in to the RDS-sql db, later this data is read in power BI in azure(through a gateway)- here we face some issue while refreshing the data set
Due to some business reason, connection from HFM to FDMEE should retain in AWS, however the data flow from FDMEE to RDS are subjected to change
So we are looking for a possibility to replace RDS-sql with azure synapse analytics, so that it become more efficient to refresh data in to power bi
so how can make a stable connection from FDMEE (which is in ec2- VPC) to azure synapse analytics ?

For data on a private network you need to deploy a Self-Hosted Integration Runtime to load the data into Synapse, or push the data somewhere Synapse can access it directly, like S3 or Azure Storage.

Related

Azure synapse With Databricks Framework for modern data warehouse

I am working On Databricks. I have curated data in form of fact and dims. Theses data consume for power bi reporting by synapse. I am not sure what is the use of synapse, If data is already cook in databricks layer. why we are using synapse in this framework.
why we are using synapse in this framework
An analytics service for data warehouses and large data is called Azure Synapse. We can combine Azure services like Power BI, Machine Learning, and others using Azure Synapse.
It offers a number of connectors that make it easier to transfer a sizable volume of data between Azure Databricks and Azure Synapse. It also offers a mechanism for Azure Databricks users to connect to Azure Synapse.
Additionally, Azure Synapse offers SQL pools for the computing environment and data warehousing.

How to create batch process to upload Oracle DB data to AWS Data Exchange?

I am looking for scope where i can send data from oracle db to AWS Data exchange without any manual intervention?
In January 2022, AWS Data Exchange launched support for data sets backed by Amazon Redshift; the same guide referenced by John Rotenstein, above, shows you how you can create a data set using Amazon Redshift datashares. If you are able to move data from the Oracle database to Amazon Redshift, this option may work for you.
AWS Data Exchange just announced a preview of data sets using AWS Lake Formation, which allows you to share data from your Lake Formation data lake, which has support for Oracle databases running in Amazon Relational Database Service (RDS) or hosted in Amazon Elastic Compute Cloud (EC2). Steps to create this kind of product can be found here.

Why Azure Databricks needs to store data in a temp storage in Azure

I was following the tutorial about data transformation with azure databricks, and it says before loading data into azure synapse analytics, the data transformed by azure databricks would be saved on temp storage in azure blob storage first before loading into azure synapse analytics. Why the need to save it to a temp storage before loading into azure synapse analytics?
The Azure storage container acts as an intermediary to store bulk data when reading from or writing to Azure Synapse. Spark connects to the storage container using one of the built-in connectors: Azure Blob storage or Azure Data Lake Storage (ADLS) Gen2.
The following architecture diagram shows how this is achieved with each HDFS bridge of the Data Movement Service (DMS) service on every Compute node connecting to an external resource such as Azure Blob Storage. PolyBase then bidirectionally transfers data between SQL Data Warehouse and the external resource providing the fast load performance.
Using PolyBase to extract, load and transform data
The steps for implementing a PolyBase ELT for SQL Data Warehouse are:
Extract the source data into text files.
Load the data into Azure Blob storage, Hadoop, or Azure Data Lake Store.
Import the data into
SQL Data Warehouse staging tables using PolyBase.
Transform the data(optional).
Insert the data into production tables.

Migrate Azure VM (Oracle) to AWS RDS Aurora

We need to migrate an Oracle DB from Azure VM to AWS RDS Aurora. And we have a checklist that what are the things we have to take care while doing migration from Oracle to Aurora.
But, what would be the best approach to do the migration. i.e., Migrate Azure Oracle VM to AWS EC2 and then, migrate to RDS. Or Migrate directly from Azure VM to AWS RDS Aurora using any Azure service(s), DMS, Datapump, SCT, something like that.
(I am not familiar with Azure DMS / DB related services)
I would go direct from Azure Oracle RM to the AWS RDS.
For the tool take a look at using the AWS Database Migration Service (DMS). It lets you connect to your source database (Azure Oracle) to your target database (AWS RDS For Oracle). AWS DMS handles creating the schema and tables in the target database.
The DMS deploys on dedicated VMs for you migration and is priced based on the size of the EC2 instance you will need from a t2.micro ($0.43/day) to a r4.8xlarge ($80/day + storage cost) and everything in between Data transfer into DMS is free and transfer to an RDS in the same AZ (availability zone) as the DMS is free.
A few features that makes the DMS nice include:
Continuous Data Replication: Once your initial migration is completed it can continue to replicate the data until you are ready to make the switch. This is nice because you do your migration ahead of time and have plenty of time for verification before you switch the application over.
Schema Conversion Tool: Not useful in your case but if you where migrating from to a different database such as Oracle to Aurora it would handle the schema for you.
You can learn more and get start at AWS Database Migration Service

How can I securely transfer my data from on-prem HDFS to Google Cloud Storage?

I have a bunch of data in an on-prem HDFS installation. I want to move some of it to Google Cloud (Cloud Storage) but I have a few concerns:
How do I actually move the data?
I am worried about moving it over the public internet
What is the best way to move data securely from my HDFS store to Cloud Storage?
To move Data from an on-premise Hadoop cluster to Google Cloud Storage, you should probably use the Google Cloud Storage connector for Hadoop. You can install the connector in any cluster by following the install directions. As a note, Google Cloud Dataproc clusters have the connector installed by default.
Once the connector is installed, you can use DistCp to move the data from your HDFS to Cloud Storage. This will transfer data over the (public) internet unless you have a special interlink setup with Google Cloud. To this end, you can use a squid proxy and configure the Cloud Storage connector to use it.

Resources