Can a Fivetran destination table be used for writing after canceling synchronisation on this table? - fivetran

I have multiple Fivetran destination tables in the Snowflake. Those tables were created by the Fivetran itself and the Fivetran currently writes data into the tables. Now I would like to stop syncing data in one of the tables and start writing to the table from a different source. Would I experience any troubles with this? Should I do something else in to make it possible?

What you mention is not possible because of how Fivetran works. Connector sources write to one destination schema and one only. Swapping destination tables between connectors is not a feature as of now.

Related

Data aggregation during data load to snowflake using snowpipe

I am evaluating snowflake for reporting usecase. I am considering snowpipe for ETL. Data is ingested from S3. Data in S3 contains information about user sessions captured at regular interval. In Snowflake, I want to stored these data aggregate. As per documentation snowflake supports only basic transformation and doesn't support group by and join during copying data from S3 staging to tables in snowflake.
I am new to ETL and snowflake. One way i was thinking is to load raw detailed data from staging to temporary table in snowflake. Then run aggregations (group by and join) on temporary table to load data into final fact tables. Is this the correct approach for implementing complex tranformations?
Temporary tables in Snowflake only stick around for the session that they have been created in. This means that you won't be able to point a Snowpipe to it.
Instead of a temporary table, point Snowflake to a transient table to store the raw data and then truncate the table after some period of time. This will reduce costs. Personally, I'd keep the data in the transient table for as long as possible provided that it is not too cost prohibitive. This is to account for potentially late data etc.
Yes, your aproach looks good to me.
Snowpipe loads your data continously from S3 to Snowflake and within Snowflake you use
Views
Tables and Stored Procedures
to transform the data and load it into your final fact table.

How to do real time data ingestion from Transactional tables to a Flat table

We have transaction tables in Oracle and for reporting purposes we need this data transfered in real time to another flat Oracle table in another database. The performance of the report is great with table placed in this flat table.
Currently we are using golden gate for replication to the other database and using materialized view for this but due to some problems we need to switch to some other way of populating/maintaining this flat table. What options do we have?
It is a pretty basic requirement but the solutions I can see are for batch processing. Also if there are any other solutions you feel would better serve this purpose. Changing the target database to something other is also an option as there might be more such reports coming ahead.

Why does the user need write permission on the location of external hive table?

In Hive, you can create two kinds of tables: Managed and External
In case of managed table, you own the data and hence when you drop the table the data is deleted.
In case of external table, you don't have ownership of the data and hence when you delete such a table, the underlying data is not deleted. Only metadata is deleted.
Now, recently i have observed that you can not create an external table over a location on which you don't have write (modification) permissions in HDFS. I completely fail to understand this.
Use case: It is quite common that the data you are churning is huge and read-only. So, to churn such data via Hive, will you have to copy this huge data to a location on which you have write permissions?
Please help.
Though it is true that dropping an external data does not result in dropping the data, this does not mean that external tables are for reading only. For instance, you should be able to do an INSERT OVERWRITE on an external table.
That being said, it is definitely possible to use (internal) tables when you only have read access, so I suspect this is the case for external tables as well. Try creating the table with an account that has write acces, and then using it with your regular account.

How to implement an ETL Process

I would like to implement a synchronization between a source SQL base database and a target TripleStore.
However for matter of simplicity let say simply 2 databases. I wonder what approaches to use to have every change in the source database replicated in the target database. More specifically, I would like that each time some row changes in the source database that this can be seen by a process that will read the changes and populate the target database accordingly while applying some transformation in the middle.
I have seen suggestion around the mechanism of notification that can
be available in the database, or building tables such that changes can
be tracked (meaning doing it manually) and have the process polling it
at different intervals, or the usage of Logs (change data capture,
etc...)
I'm seriously puzzle about all of this. I wonder if anyone could give some guidance and explanation about the different approaches with respect to my objective. Meaning: name of methods and where to look.
My organization mostly uses: Postgres and Oracle database.
I have to take relational data and transform them in RDF so as to store them in a triplestore and keep that triplestore constantly synchronized with the data is the SQL Store.
Please,
Many thanks
PS:
A clarification between ETL and replication techniques as in Change Data capture, with respect to my overall objective would be appreciated.
Again i need to make sense of the subject, know what are the methods, so i can further start digging for myself. So far i have understood that CDC is the new way to go.
Assuming you can't use replication and you need to use some kind of ETL process to actually extract, transform and load all changes to the destination database, you could use insert, update and delete triggers to fill a (manually created) audit table. Columns GeneratedId, TableName, RowId, Action (insert, update, delete) and a boolean value to determine if your ETL process has already processed this change. Use that table to get all the changed rows in your database and transport them to the destination database. Then delete the processed rows from the audit table so that it doesn't grow too big. How often you have to run the ETL process depends on the amount of changes occurring in the source database.

Compare data of table from two different Database

I have two separate database in two separate servers. Both these database have same table . I just wanted to compare similar tables wrt to the data contained.
Also to access one database from other database do I need to create a DBLink
Have you tried to find anything in Google? There are millions of posts for this topic.
Use this documentation to learn about dbms_comparison package

Resources