how does AWS DMS validates source and target data? - amazon-aurora

Curious to know how does AWS DMS validates that both source and target databases are in-sync, does it validate row by row on each table? If yes, what is the validation time it takes to complete the validation task?

Related

When to disable aws Amplify or AppSync conflict resolution

I noticed in the new Amplify Graphql transformer v2, AppSync Conflict Resolution is enabled for all tables by default (https://docs.aws.amazon.com/appsync/latest/devguide/conflict-detection-and-sync.html), I wonder if it will bring any harm if I disable conflict resolution for my API?
I'm building a yelp like rating app, and if two clients try to mutate the same object, I think it's fine just let them mutate concurrently and the request comes later overrides the previous one. So I don't really understand what this conflict resolution is useful for?
I feel it's really inconvenient that I need to pass in a _version field when mutating an object and when deleting, it will not delete immediately, instead it will have _deleted field set to true and schedule to delete after ttl time
Thanks very much!
Pro tip: to disable conflict resolver in amplify, run amplify update api, and you will be prompt to a choice to disable conflict resolver
Versioned Data Sources
AWS AppSync currently supports versioning on DynamoDB data sources. Conflict Detection, Conflict Resolution, and Sync operations require a Versioned data source. When you enable versioning on a data source, AWS AppSync will automatically:
Enhance items with object versioning metadata.
Record changes made to items with AWS AppSync mutations to a Delta table.
Maintain deleted items in the Base table with a “tombstone” for a configurable amount of time.
Versioned Data Source Configuration
When you enable versioning on a DynamoDB data source, you specify the following fields:
BaseTableTTL
The number of minutes to retain deleted items in the Base table with a “tombstone” - a metadata field indicating that the item has been deleted. You can set this value to 0 if you want items to be removed immediately when they are deleted. This field is required.
DeltaSyncTableName
The name of the table where changes made to items with AWS AppSync mutations are stored. This field is required.
DeltaSyncTableTTL
The number of minutes to retain items in the Delta table. This field is required.
Delta Sync Table
AWS AppSync currently supports Delta Sync Logging for mutations using PutItem, UpdateItem, and DeleteItem DynamoDB operations.
When an AWS AppSync mutation changes an item in a versioned data source, a record of that change will be stored in a Delta table that is optimized for incremental updates. You can choose to use different Delta tables (e.g. one per type, one per domain area) for other versioned data sources or a single Delta table for your API. AWS AppSync recommends against using a single Delta table for multiple APIs to avoid the collision of primary keys.
The schema required for this table is as follows:
ds_pk
A string value that is used as the partition key. It is constructed by concatenating the Base data source name and the ISO8601 format of the date the change occurred. (e.g. Comments:2019-01-01)
ds_sk
A string value that is used as the sort key. It is constructed by concatenating the IS08601 format of the time the change occurred, the primary key of the item, and the version of the item. The combination of these fields guarantees uniqueness for every entry in the Delta table (e.g. for a time of 09:30:00 and an ID of 1a and version of 2, this would be 09:30:00:1a:2)
_ttl
A numeric value that stores the timestamp, in epoch seconds, when an item should be removed from the Delta table. This value is determined by adding the DeltaSyncTableTTL value configured on the data source to the moment when the change occurred. This field should be configured as the DynamoDB TTL Attribute.
The IAM role configured for use with the Base table must also contain permission to operate on the Delta table. In this example, the permissions policy for a Base table called Comments and a Delta table called ChangeLog is displayed:

AWS DMS with CDC. The update records only include the updated field. How to include all?

We recently started the process of continuous migration (initial load + CDC) from an Oracle database on RDS to S3 using AWS DMS. The DB is using LogMiner.
the problem that we have detected is that the CDC records of type Update only contain the data that was updated, leaving the rest of the fields empty, so the possibility of simply taking as valid the record with the maximum timestamp value is lost.
Does anyone know if this can be changed or in what part of the DMS or RDS configuration to touch so that the update contains the information of all the fields of the record?
Thanks in advance.
Supplemental Logging at table level may increase what is logged, but that will also increase total volume of log data written for a given workload.
Many Log Based Data Replication products from various vendors require additional supplemental logging at the table level to ensure the full row data for updates with before and after change data is written to the database logs.
re: https://docs.oracle.com/database/121/SUTIL/GUID-D857AF96-AC24-4CA1-B620-8EA3DF30D72E.htm#SUTIL1582
Pulling data through LogMiner may be possible, but you will need to evaluate if it will scale with the data volumes you need.
DMS-FULL/CDC also supports Binary Reader better option to LogMiner. In order to capture updates WITH all the columns use "ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS" on Oracle side.
This will push all the columns in a update record to endpoint from Oracle RAC/non-RAC dbs. Also, a pointer for CDC is use TRANSACT_ID in DMS side to generate a unique sequence for each record. Redo will be little more but, it is what it is; you can keep an eye on it and DROP the supplemental logging if require at the table level.
Cheers!

Hasura graphQl update trigger repeats many time

My table columns look like name, email, phone, and pin.
I'm using Hasura for collecting user details.
Problem:
I want to hash the pin field using some hashing algorithm. So I decided to have a separate AWS Lambda function to convert a plain pin to hashed one and update it to the same column.
Now I set a trigger (when pin get updated it will trigger the webhook). I successfully updated the hashed one to my database. But the problem is after lambda updated the field Hasura re-trigger the webhook again. The process is to keep on going until I shut down my Hasura instance.
In Hasura documentation they mentioned below
In case of UPDATE, the events are delivered only if new data is
distinct from old data. The composite type comparison is used to
compare the old and new rows. If rows contain columns, which cannot be
compared using <> operator, then internal binary representation of
rows by Postgres is compared.
however, after the lambda update, the data is same as old one but why it is kept on calling.
I think you should use action for this instead of trigger. With that way, database only store hashed pin.

How to parameterise the data connection in Tableau in AWS (cloudformation or otherwise)?

I have a simple web app UI (which stores certain dataset parameters (for simplicity, assuming they are all data tables in a single Redshift database, but the schema/table name can vary, and the Redshift is in AWS). Tableau is installed on an EC2 instance in the same AWS account.
I am trying to determine an automated way of passing 'parameters' as a data source (i.e. within the connection string inside Tableau on EC2/AWS) rather than manually creating data source connections and inputting the various customer requests.
The flow for the user would be say 50 users select various parameters on the UI (for simplicity suppose the parameters are stored as a JSON file in AWS) -> parameters are sent to Tableau and data sources created -> connection is established within Tableau without the customer 'seeing' anything in the back end -> customer is able to play with the data in Tableau and create tables and charts accordingly.
How may I do this at least through a batch job or cloud formation setup? A "hacky" solution is fine.
Bonus: if the above is doable in real-time across multiple users that would be awesome.
** I am open to using other dashboard UI tools which solve this problem e.g. QuickSight **
After installing Tableau on EC2 I am facing issues in finding an article/documentation of how to pass parameters into the connection string itself and/or even parameterise manually.
An example could be customer1 selects "public_schema.dataset_currentdata" and "public_scema.dataset_yesterday" and one customer selects "other_schema.dataser_currentdata" all of which exist in a single database.
3 data sources should be generated (one for each above) but only the data sources selected should be open to the customer that selected it i.e. customer2 should only see the connection for other_schema.dataset_currentdata.
One hack I was thinking is to spin up a cloud formation template with Tableau installed for a customer when they make a request, creating the connection accordingly, and when they are done then just delete the cloud formation template. I am mainly unsure how I would get the connection established though i.e. pass in the parameters. I am not sure spinning up 50 EC2's though is wise. :D
An issue I have seen so far is creating a manual extract limits the number of rows. Therefore I think I need a live connection per customer request. Hence I am trying to get around this issue.
You can do this with a combination of a basic embed and applying filters. This would load the Tableau workbook. Then you would apply a filter based on whatever values your user selects from the JSON.
The final missing part is that you would use a parameter instead of a filter and pass those values to the database via initial sql.

OSB schema validation performance

Is there any performance issues when using Validate nodes in a proxy service inside Oracle Service Bus (OSB)?
What are the best practices when using the Validate node?
What's the time cost of using several Validate nodes, for example:
Validate header
Branch depending on incomping operation
Validate body schema depending on operation
Do a xquery transformation
Validate schema after transformation
Send request to business service
Is it useful a Validate node in step 5 after a xquery? Doesn't a xquery transformation assure schema integrity?
Thanks!
Validation does have a performance cost, but generally you validate by default and only reassess when the performance is not sufficient (it's likely that performance gains could be found elsewhere first, by using split-joins, or by rationalising multiple OSB nodes into a single xquery)
Personally, I'd validate the request after the first operational branch (so you know what element to validate against), then, optionally validate the response just before you send it back in the response pipeline.
And no, xquery transforms do not assure schema integrity. I would not recommend validating after your own xquery transformation; the result is within your control so you should be testing that in other ways (ideally, statically), rather than relying on a runtime validation.

Resources