My table columns look like name, email, phone, and pin.
I'm using Hasura for collecting user details.
Problem:
I want to hash the pin field using some hashing algorithm. So I decided to have a separate AWS Lambda function to convert a plain pin to hashed one and update it to the same column.
Now I set a trigger (when pin get updated it will trigger the webhook). I successfully updated the hashed one to my database. But the problem is after lambda updated the field Hasura re-trigger the webhook again. The process is to keep on going until I shut down my Hasura instance.
In Hasura documentation they mentioned below
In case of UPDATE, the events are delivered only if new data is
distinct from old data. The composite type comparison is used to
compare the old and new rows. If rows contain columns, which cannot be
compared using <> operator, then internal binary representation of
rows by Postgres is compared.
however, after the lambda update, the data is same as old one but why it is kept on calling.
I think you should use action for this instead of trigger. With that way, database only store hashed pin.
Related
I noticed in the new Amplify Graphql transformer v2, AppSync Conflict Resolution is enabled for all tables by default (https://docs.aws.amazon.com/appsync/latest/devguide/conflict-detection-and-sync.html), I wonder if it will bring any harm if I disable conflict resolution for my API?
I'm building a yelp like rating app, and if two clients try to mutate the same object, I think it's fine just let them mutate concurrently and the request comes later overrides the previous one. So I don't really understand what this conflict resolution is useful for?
I feel it's really inconvenient that I need to pass in a _version field when mutating an object and when deleting, it will not delete immediately, instead it will have _deleted field set to true and schedule to delete after ttl time
Thanks very much!
Pro tip: to disable conflict resolver in amplify, run amplify update api, and you will be prompt to a choice to disable conflict resolver
Versioned Data Sources
AWS AppSync currently supports versioning on DynamoDB data sources. Conflict Detection, Conflict Resolution, and Sync operations require a Versioned data source. When you enable versioning on a data source, AWS AppSync will automatically:
Enhance items with object versioning metadata.
Record changes made to items with AWS AppSync mutations to a Delta table.
Maintain deleted items in the Base table with a “tombstone” for a configurable amount of time.
Versioned Data Source Configuration
When you enable versioning on a DynamoDB data source, you specify the following fields:
BaseTableTTL
The number of minutes to retain deleted items in the Base table with a “tombstone” - a metadata field indicating that the item has been deleted. You can set this value to 0 if you want items to be removed immediately when they are deleted. This field is required.
DeltaSyncTableName
The name of the table where changes made to items with AWS AppSync mutations are stored. This field is required.
DeltaSyncTableTTL
The number of minutes to retain items in the Delta table. This field is required.
Delta Sync Table
AWS AppSync currently supports Delta Sync Logging for mutations using PutItem, UpdateItem, and DeleteItem DynamoDB operations.
When an AWS AppSync mutation changes an item in a versioned data source, a record of that change will be stored in a Delta table that is optimized for incremental updates. You can choose to use different Delta tables (e.g. one per type, one per domain area) for other versioned data sources or a single Delta table for your API. AWS AppSync recommends against using a single Delta table for multiple APIs to avoid the collision of primary keys.
The schema required for this table is as follows:
ds_pk
A string value that is used as the partition key. It is constructed by concatenating the Base data source name and the ISO8601 format of the date the change occurred. (e.g. Comments:2019-01-01)
ds_sk
A string value that is used as the sort key. It is constructed by concatenating the IS08601 format of the time the change occurred, the primary key of the item, and the version of the item. The combination of these fields guarantees uniqueness for every entry in the Delta table (e.g. for a time of 09:30:00 and an ID of 1a and version of 2, this would be 09:30:00:1a:2)
_ttl
A numeric value that stores the timestamp, in epoch seconds, when an item should be removed from the Delta table. This value is determined by adding the DeltaSyncTableTTL value configured on the data source to the moment when the change occurred. This field should be configured as the DynamoDB TTL Attribute.
The IAM role configured for use with the Base table must also contain permission to operate on the Delta table. In this example, the permissions policy for a Base table called Comments and a Delta table called ChangeLog is displayed:
I am a little confused on the best approach in how to update two tables with on GraphQL mutation, I am using AWS AppSync.
I have an application where I need a User to be able to register for an Event. Given I am using DynamoDB as the database, I had thought about a denormalized data structure for the User and Event tables. I am thinking of storing an array of brief Event details, such as eventID and title in the User table and an array of entrants in the Events table, holding only brief user info, such as userID and name. Firstly, is this a good approach or should I have a third join table to hold these 'relationships'.
If it's OK, I am needing to update both tables during the signUp mutation, but I am struggling to get my head around how to update 2 tables with the one mutation and in turn, one request mapping template.
Am I right in thinking I need to use a Pipeline resolver? Or is there another way to do this?
There are multiple options for this:
AppSync supports BatchWrite operations to update multiple DynamoDb tables at the same time
AppSync supports DynamoDb transactions to update multiple DynamoDb tables transactionally at the same time
Pipeline resolvers
I have some csv files stored in a Blob Storage. Each csv gets updated every day. That update consist in the insertion of some new rows and the modification of some old rows. I'm using Azure Data Factory (v2) to get that data from the Blob storage and sink it on a SQL database.
The problem is that my process takes around 15 minutes to finish, so I suspect that I'm not following the BEST PRACTICES.
I don't know how exactly works the "Upsert" sink method. But I think this method needs a boolean condition that indicates if you want to Update that row (if true) or insert that row (if false).
I get that condition using a column that I get by making a join of the csv (origin) with the ddbb (destiny). Making it this way you will get a "null" if the row is a new one, and a "not null" if the row exists on the ddbb already. So I insert the rows with that "null" value and the other ones I just update them.
This is the best/correct way to do this kind of upsert methods? Could I do something better to improve my times?
Are you using Data Flows? If so, you can update your SQL DB using upsert or separate insert/update paths. Set the policy for which values you wish to update in an Alter Row transformation, then set the Sink for Upsert, Update, and/or Insert. You will need to identify the key column on your sink that we will use as the update key on your database.
I am new to Cassandra. I am looking at many examples online. Here is one from JHipster Cassandra examples on GitHub:
https://gist.github.com/jdubois/c3d3bedb869466731316
The repository save(user) method does a read (to look for existence) then a delete and re-insert of the existing user across all the denormalized tables whenever the user data changed.
Is this best practice?
Is this only because of how the data model for this sample is designed?
Is this sample's design a result of twisting a POJO framework into a NoSQL database design?
When would I want to just do a update in Cassandra? It supports updates at the field-level, so it seems like that would be preferred.
First of all, the delete operations should be part of the batch for more robust error handling. But it looks like there are also some concurrency issues with the code. It will update the user based on the current user value read before. It's not save to assume this will still be the latest value while save() is actually executed. It will also just overwrite any keys in the lookup table that might be in use for a different user at that point. E.g. the login could already exist for another user while executing insertByLoginStmt.
It is not necessary to delete a row before inserting a new one.
But if you are replacing rows and new columns are different from existing columns then you need to delete all existing columns and insert new columns. Or insert new and delete old, does not matter if happens in batch.
is there a way of knowing ID of identity column of record inserted via InsertOnSubmit beforehand, e.g. before calling datasource's SubmitChanges?
Imagine I'm populating some kind of hierarchy in the database, but I wouldn't want to submit changes on each recursive call of each child node (e.g. if I had Directories table and Files table and am recreating my filesystem structure in the database).
I'd like to do it that way, so I create a Directory object, set its name and attributes,
then InsertOnSubmit it into DataContext.Directories collection, then reference Directory.ID in its child Files. Currently I need to call InsertOnSubmit to insert the 'directory' into the database and the database mapping fills its ID column. But this creates a lot of transactions and accesses to database and I imagine that if I did this inserting in a batch, the performance would be better.
What I'd like to do is to somehow use Directory.ID before commiting changes, create all my File and Directory objects in advance and then do a big submit that puts all stuff into database. I'm also open to solving this problem via a stored procedure, I assume the performance would be even better if all operations would be done directly in the database.
One way to get around this is to not use an identity column. Instead build an IdService that you can use in the code to get a new Id each time a Directory object is created.
You can implement the IdService by having a table that stores the last id used. When the service starts up have it grab that number. The service can then increment away while Directory objects are created and then update the table with the new last id used at the end of the run.
Alternatively, and a bit safer, when the service starts up have it grab the last id used and then update the last id used in the table by adding 1000 (for example). Then let it increment away. If it uses 1000 ids then have it grab the next 1000 and update the last id used table. Worst case is you waste some ids, but if you use a bigint you aren't ever going to care.
Since the Directory id is now controlled in code you can use it with child objects like Files prior to writing to the database.
Simply putting a lock around id acquisition makes this safe to use across multiple threads. I've been using this in a situation like yours. We're generating a ton of objects in memory across multiple threads and saving them in batches.
This blog post will give you a good start on saving batches in Linq to SQL.
Not sure off the top if there is a way to run a straight SQL query in LINQ, but this query will return the current identity value of the specified table.
USE [database];
GO
DBCC CHECKIDENT ("schema.table", NORESEED);
GO