Is it possible to "purge"/"clear" the iterator of a dynamoDb table's trigger ?
Context:
A lambda is processing updates on a given table using a trigger.
Badly formatted records are inserted in the table and the lamdba is unable to process it, stalling the iterator.
A few time after a fix is issued to insert correctly formatted records in the table. So we want to "fast forward" the processing of updates and skip the old ones?
I suppose deleting/recreating the trigger would do that. Is there a "better" way ?
Old question, but this can help somebody else:
as stated in the comments, I don't think it's possible to simply purge the queue.
However, when you re-create it, you can chose the "starting position" between:
"Latest" (will skip all previously stored messages)
"Trim horizon" (will process all the available messages)
Re-creating the trigger is pretty easy, that's what I have done in a similar case recently
Related
Clickhouse recently released the new DELETE query, that allows you to quickly mark data as deleted (https://clickhouse.com/docs/en/sql-reference/statements/delete/).
The actual data deletion is done in the background afterwards, as is written in the docs.
My question is, is there some indication to when data would be deleted?
Or is there a way to make sure it gets deleted?
I'm asking for GDPR compliance purposes.
Thanks
For GDRP better use
ALTER TABLE ... DELETE WHERE ...
and poll
SELECT * FROM system.muations WHERE is_done=0
when mutation complete (is_done=1), that mean data deleted physically
According to CH support on slack:
If using the DELETE query, the only way to make sure data is deleted is to use the OPTIMIZE FINAL query.
This would trigger a merge and the deleted data won't be included in the outcome
I'm looking at doing a data-fix and need to be able to prove that the data I have intended to change is the only data changed. (For example - that a trigger hasn't modified additional columns that I wasn't expecting)
I've been looking at Oracle's Flashback Query process, which would be great, except that this is not enabled on the database in question.
Since this check would be carried out prior to committing the transaction, Oracle must have the "before" information squirreled away somewhere, and I wondered if there is any way of accessing this undo information?
Otherwise, I would potentially have to make a temporary copy of each table and do a compare between the live table and the backup, which may also result in inconsistencies between the backup query time and the update transaction time.
While I'm expecting the answer "no", I'm hoping someone can point me in a better direction than that which I appear to be headed at present.
Thanks!
I am new to Teradata & fortunately got a chance to work on both DDL-DML statements.
One thing I observed is Teradata is very slow when time comes to UPDATE the data in a table having large number of records.
The simplest way I found on the Google to perform this update is to write an INSERT-SELECT statement with a CASE on column holding values to be update with new values.
But what when this situation arrives in Data Warehouse environment, when we need to update multiple columns from a table holding millions of rows ?
Which would be the best approach to follow ?
INSERT-SELECT only OR MERGE-UPDATE OR MLOAD ?
Not sure if any of the above approach is not used for this UPDATE operation.
Thank you in advance!
At enterprise level, we expect volumes to be huge and updates are often part of some scheduled jobs/scripts.
With huge volume of data, Updates comes as a costly operation that involve risk of blocking table for some time in case the update fails (due to fallback journal). Although scripts are tested well, and failures seldom happen in production environments, it's always better to have data that needs to be updated loaded to a temporary table in required form and inserted back to same table after deleting matching records to maintain SCD-1 (Where we don't maintain history).
Am attempting to design an Ab Initio load process without any Ab Initio training or documentation. Yeah I know.
A design decision is: for the incoming data files there will be inserts and updates.
Should I have the feed provider split them into to data files (1 - 10 GB in size nightly) and have Ab Initio do inserts and updates separately?
A problem I see with that, is data isnt always what you expect it to be...
And an Insert row may be already present (perhaps purge failed or feed provider made a mistake)
Or UPdate row isnt present.
So I'm wondering if I should just combine all inserts and updates... and use Oracle Merge statement
(after parallel loading the data into a staging table with no index of course)
But I don't know if AbInitio supports Merge or not.
There is not much for ab initio tutorials or docs on web... can you direct me to anything good?
The solution which you just depicted (inserts and updates in a staging table and then merging the content in the main table) is feasible.
A design decision is: for the incoming data files there will be inserts and updates.
I don't know the background of this decision but you should know that this solution will result in longer execution time. In order to execute inserts and updates you have to use the "Update Table" component which is slower than a simpler "Output Table" component. By the way don't use the same "Update Table" component for inserts and updates simultaneously. Use a separate "Update Table" for inserts and another one for updates instead (you'll experience dramatic performance boost in this way). (If you can change the above mentioned design decision then use an "Output Table" instead.)
In either case set the "Update Table"/"Output Table" components to "never abort" so that your graph won't fail if the same insert statement occurs twice or if there's no entry to update on.
Finally the "oracle merge" statement should be fired/executed from a "Run SQL" component when the processing of all the inserts and updates are finished. Use phases to make sure it happens this way...
If you intend to build a graph with parallel execution then make sure that the insert and update statements for the same entries will be processed by the same partitions. (Use the primary key of the final table as the key in the "partition by key" component.)
If you want to have an overview of how many duplicated inserts or wrong updates occur in your messy input then use the "Reject" (and eventually "Error") port of the appropriate "Update Table"/"Output Table" components for further processing.
I would certainly not rely on a source system to tell me whether rows are present in the target table or not. My instinct says to go for a parallel, nologging (if possible), compress (if possible) load into a staging table followed by a merge -- if Ab-Initio does not support Merge then hopefully it supports a call to a PL/SQL procedure, or direct execution of a SQL statement.
If this is a large amount of data I'd like to arrange hash partitioning on the join key for the new and current data sets too.
Is there a way to exclusively lock a table for reading in Oracle (10g) ? I am not very familiar with Oracle, so I asked the DBA and he said it's impossible to lock a table for reading in Oracle?
I am actually looking for something like the SQL Server (TABLOCKX HOLDLOCK) hints.
EDIT:
In response to some of the answers: the reason I need to lock a table for reading is to implement a queue that can be read by multiple clients, but it should be impossible for 2 clients to read the same record. So what actually happens is:
Lock table
Read next item in queue
Remove item from the queue
Remove table lock
Maybe there's another way of doing this (more efficiently)?
If you just want to prevent any other session from modifying the data you can issue
LOCK TABLE whatever
/
This blocks other sessions from updating the data but we cannot block other peple from reading it.
Note that in Oracle such table locking is rarely required, because Oracle operates a policy of read consistency. Which means if we run a query that takes fifteen minutes to run the last row returned will be consistent with the first row; in other words, if the result set had been sorted in reverse order we would still see exactly the same rows.
edit
If you want to implement a queue (without actually using Oracle's built-in Advanced Queueing functionality) then SELECT ... FOR UPDATE is the way to go. This construct allows one session to select and lock one or more rows. Other sessions can update the unlocked rows. However, implementing a genuine queue is quite cumbersome, unless you are using 11g. It is only in the latest version that Oracle have supported the SKIP LOCKED clause. Find out more.
1. Lock table
2. Read next item in queue
3. Remove item from the queue
4. Remove table lock
Under this model a lot of sessions are going to be doing nothing but waiting for the lock, which seems a waste. Advanced Queuing would be a better solution.
If you want a 'roll-your-own' solution, you can look into SKIP LOCKED. It wasn't documented until 11g, but it is present in 10g. In this algorithm you would do
1. SELECT item FROM queue WHERE ... FOR UPDATE SKIP LOCKED
2. Process item
3. Delete the item from the queue
4. COMMIT
That would allow multiple processes to consume items off the queue.
The TABLOCKX and HOLDLOCK hints you mentioned appear to be used for writes, not reads (based on http://www.tek-tips.com/faqs.cfm?fid=3141). If that's what you're after, would a SELECT FOR UPDATE fit your need?
UPDATE: Based on your update, SELECT FOR UPDATE should work, assuming all clients use it.
UPDATE 2: You may not be in a position to do anything about it right now, but this sort of problem is actually an ideal fit for something other than a relational database, such as AMQP.
If you mean, lock a table so that no other session can read from the table, then no, you can't. Why would you want to do that anyway?