Dynamo DB : inserting data while creating the table through a lambda function - aws-lambda

I have a lambda function which I am using to create the DynamoDB table.
I have a requirement, wherein I need to insert some data in this table, after the table is created.
In DynamoDB, Create table is an asynchronous call. While the table is being created, it is in "CREATING" state and after that it goes into "ACTIVE" state.
The challenge is, I can't insert the data in this table, till it is not in "ACTIVE" state and I get ResourceNotFoundException exception.
Is there any way, I can insert this data in the table while it is created?
I want to complete the table creation and the data insertion in the same Lambda function call.

As you've discovered, you can only write to an active(/created) table, and there is no way to provide data for 'preloading' your table using dynamodb::CreateTable.
There are no events emitted for when the table is ready. So instead you'll have to poll until the table becomes active. This should be easily achieved in a lambda, as DynamoDB rarely takes more than 30-60 seconds to provision a table.
After creating the table, you can call dynamodb::DescribeTable every second (or so) and wait until it returns Table.TableStatus === 'ACTIVE'. Once they table status has turned to active, you can insert your initial data. Just remember to increase your Lambda timeout to the full 15 minutes, in case it does take AWS longer to provision your table.
You can see an example of this in the AWS Documentation.

Related

Redshift: How to fix serializable isolation violation(1023) caused by concurrent MERGE operations?

My use case is to extract, transform and load data incrementally and in real time from x number of Lambda functions. I expect multiple Lambda functions to be running concurrently and Redshift to stay alive for read queries.
Since Redshift doesn't enforce primary key(s) constraints, I'm using aws documentation Merge examples - Example of a merge that replaces existing rows to enforce unique rows. This method works fine when there is only 1 instance of lambda function running.
-- Start a new transaction
begin transaction;
-- Delete any rows from SALES that exist in STAGESALES, because they are updates
-- The join includes a redundant predicate to collocate on the distribution key
-- A filter on saletime enables a range-restricted scan on SALES
delete from sales
using stagesales
where sales.salesid = stagesales.salesid
and sales.listid = stagesales.listid
and sales.saletime > '2008-11-30';
-- Insert all the rows from the staging table into the target table
insert into sales
select * from stagesales;
-- End transaction and commit
end transaction;
-- Drop the staging table
drop table stagesales;
But as soon as > 1 lambda functions are running concurrently and accessing the same table, I'll receive:
"ERROR: 1023 DETAIL: Serializable isolation violation on table in Redshift" when performing operations in a transaction concurrently with another session.
How should I modify this example to allow it to run in a concurrent environment?
The issue you are running into is that you have multiple lambda functions executing DML on the same table concurrently. Redshift doesn't support concurrent transactions that are not serializable, i.e. that try and modify the same data at the same time. In that scenario Redshift will abort one or more of the transactions to ensure that all DML that gets executed is serializable.
Your current design will not work properly when scaled to more than one lambda function due to these restrictions in the way Redshift works. You will need to devise a method of managing the lambda function(s) such that there are not conflicting DML statements being run concurrently on the same table. It's not clear why you are using multiple lambda functions to do this so I can't comment on what an alternative would look like.
did you try locking table in each code as this will not allow other transaction to modify the data? You can all have separete staging table for different lambdas and have a merge job running parally which combine data from them and merge to final table.
A 1023 is a retriable error.
If it's happening only from time to time you can consider capturing it in your lambda function and then just submit the query again.

How to safely update hive external table

I have an external hive table and I would like to refresh the data files on a daily basis. What is the recommended way to do this?
If I just overwrite the files, and if we are unlucky enough to have some other hive queries to execute in parallel against this table, what will happen to those queries? Will they just fail? Or will my HDFS operations fail? Or will they block until the queries complete?
If availability is a concern and space isn't an issue, you can do the following:
Make a synonym for the external table. Make sure all queries use this synonym when accessing the table.
When loading new data, load it to a new table with a different name.
When the load is complete, point the synonym to the newly loaded table.
After an appropriate length of time (long enough for any running queries to finish), drop the previous table.
First of all.. if you are accessing any table it may have two types of locks:
exclusive(if data is getting added) and shared(if data is getting read)..
so if you insert overwrite and add data into the table then at that time if you access the table with other queries, they wont get executed because there will be an exclusive lock on it and once the insert overwrite query completes then you may access the table.
Please refer to the following link:
https://cwiki.apache.org/confluence/display/Hive/Locking

Can we pass parameters in triggers in Oracle?

Like in procedures and Functions can we pass parameters to triggers? Can A trigger can be explicitly called?
An object based trigger is raised by an event's occurence(as update,insert,select)on a specific object of the database. There is also system triggers, fired by system specific events(as shutdown,startup database, user connection etc..).
This is the main purpose of a trigger in databases, you can't raise it explicitly, if you want it to run the only way is to raise the event. Also passing parameters isn't part of trigger definition, but you can handle the event attributes,(which can be passed to the trigger body that may can contain functions or procedures).
I hope that i've responded to your question, can i know what is your need for trying to do that.?
What you can do is to create a table that will store temporarily the data that you want to access in your trigger.
1-Create a table "tmp_data" for instance.
2-Before running the event that will fire the trigger (stored procedure, insert, update...) insert into tmp_data the data that you want to use in the trigger.
3-In the trigger, to access the data you needed, you make a query on the table tmp_data.
4-After have been done with the data, you clean the table tmp_data for the next use.
Hope has been helpful!

BigQuery: insert rows, but it's not written

I am updating a list of rows using the method tabledata().insertAll from the bigquery object. After the execution, the return shows no errors. However, my tables still continue with no data written.
Could be a problem of permission. If so, why there's no errors returned?
Thanks
This can happen if you do the insert right after deleting and re-creating the table.
The streaming buffer of a deleted table is not deleted right at the time that table is deleted, which can cause new inserts to be delivered to this old streaming buffer.
From BigQuery documentation:
deleting and/or recreating a table may create a period of time where streaming inserts are effectively delivered to the old table and will not be present in the newly created table.
And in this case, no errors would be returned.
References:
https://cloud.google.com/bigquery/troubleshooting-errors#metadata-errors-for-streaming-inserts
https://github.com/GoogleCloudPlatform/google-cloud-php/issues/871#issuecomment-361339800
https://cloud.google.com/bigquery/streaming-data-into-bigquery

How to find number of times record is accessed for oracle

I have oracle 10g database in my application. How to find out that how many times the a particular records has been accessed in particular table.
In general, to do that, you need to access records in a table though stored procedure, not through SELECT statement.
But, here is how it could be simplified:
you add a requirement that any SELECT to your table should have a function call:
select yourtable.* from yourtable
where yourfunct('yourtable', yourtable.key) = 'done'
this could be easily done through view plus revoking permissions to read the table itself
in your function, you either save table/key pair inside a table in a package (you don't need to start a transaction to do that) or you start an autonomous transaction and write into a real table.
writing into a variable in a package is not thead safe, but it is much faster.
creating a transaction is slooooooower, but it will garantie a result.
My personal preference would be to question an original task. Maybe it would be enough to create sort of 'log entry' table where requests for data are recorded.

Resources