TTL for every table in database - clickhouse

I have a Clickhouse db, for logs. I want to store last day of them. And I have some kind of mechanism which aggregates logs by app_name. It simply creates a table in my db for app and pushes logs in table related to this app. So the main question how I can specify TTL for every table which will be created in db
I have done this manualy by basic usage of ttl like this. But for whole db i can't find anything

You can't set TTL at a db level - either table or column level only https://clickhouse.com/docs/en/engines/table-engines/mergetree-family/mergetree#table_engine-mergetree-ttl
You'll need to either schedule ALTER commands or modify your table creation logic.

Related

Data Readiness Check

Let's say there is a job A which executes a Python to connect to Oracle, fetch the data from Table A and load the data into Snowflake once a day. Application A dependent on Table A in Snowflake can just depend on the success of job A for further processing, this is easy.
But if the data movement is via Replication (Change Data Capture from Oracle moves to s3 using Golden Gate, pipes pushes into stage, stream to target using Task every few mins) - what is the best way to let Application A know that the data is ready? How to check if the data is ready? is there something available in Oracle, like a table level marker that can be moved over to Snowflake? Table's in Oracle cannot be modified to add anything new, marker rows also cannot be added - these are impractical. But something that Oracle provides implicitly, which can be moved over to Snowflake or some SCN like number at the table level that can be compared every few minutes could be a solution, eager to know any approaches.

AWS DMS with CDC. The update records only include the updated field. How to include all?

We recently started the process of continuous migration (initial load + CDC) from an Oracle database on RDS to S3 using AWS DMS. The DB is using LogMiner.
the problem that we have detected is that the CDC records of type Update only contain the data that was updated, leaving the rest of the fields empty, so the possibility of simply taking as valid the record with the maximum timestamp value is lost.
Does anyone know if this can be changed or in what part of the DMS or RDS configuration to touch so that the update contains the information of all the fields of the record?
Thanks in advance.
Supplemental Logging at table level may increase what is logged, but that will also increase total volume of log data written for a given workload.
Many Log Based Data Replication products from various vendors require additional supplemental logging at the table level to ensure the full row data for updates with before and after change data is written to the database logs.
re: https://docs.oracle.com/database/121/SUTIL/GUID-D857AF96-AC24-4CA1-B620-8EA3DF30D72E.htm#SUTIL1582
Pulling data through LogMiner may be possible, but you will need to evaluate if it will scale with the data volumes you need.
DMS-FULL/CDC also supports Binary Reader better option to LogMiner. In order to capture updates WITH all the columns use "ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS" on Oracle side.
This will push all the columns in a update record to endpoint from Oracle RAC/non-RAC dbs. Also, a pointer for CDC is use TRANSACT_ID in DMS side to generate a unique sequence for each record. Redo will be little more but, it is what it is; you can keep an eye on it and DROP the supplemental logging if require at the table level.
Cheers!

Update database records based on date column

I'm working on a app where I have some entities in the database that have a column representing the date until that particular entity is available for some actions. When it expires I need to change it's state, meaning updating a column representing it's state.
What I'm doing so far, whenever I ask the database for those entities to do something with them, I first check if they are not expired and if they are, I update them. I don't particularly like this approach, since that means I will have a bunch of records in the database that would be in the wrong state just because I haven't queried them. Another approach would be to have a periodic task that runs over those records and updates them as necessary. That I also don't like since again, I would have records in a inconsistent state and in this case, the first approach seems more reasonable.
Is there another way of doing this, am I missing something? I need to mention, I use spring-boot + hibernate for my application. The underlying db is Postgresql. Is there any technology specific trick I can use to obtain what I want?
in database there it no triger type expired. if you have somethind that expired and you should do somethig with that there is two solutions (you have wrote about then) : do some extra with expired before you use data , and some cron/task (it might be on db level or on server side).
I recomend you use cron approach. Here is explanation :
do something with expired before you get data :
updated before select
+: you update expired data before you need it , and here are questions - update only that you requested or all that expired... update all might be time consumed in case if from all records you need just 2 records and updated 2000 records that are not related you you working dataset.
-: long time to update all record ; if database is shared - access to db not only throth you application , logic related to expired is not executed(if you have this case); you need controll entry point where you should do something with expired and where you shouldn't ; if time expired in min , sec - then even after you execure logic for expired , in next sec new records might be expired too;also if you need update workflow logic for expired data handling you need keep it in one plase - in cron , in case with update before you do select you should update changed logic too.
CRON/TASK
-: you should spend time to configure it just once 30-60 mins max:) ;
+: it's executed in the background ; if your db is used not only by your application , expired data logic also be available; you don't have to check(and don't rememebr about it , and explaine about for new employee....) is there any staled data in your java code before select something; you do split logic between cares about staled data , and normal queries do db .
You can execute 'select for update' in cron and even if you do select during update time from server side query you will wait will staled data logic complets and you get in select up to date data
for spring :
spring scheduling documentation , simple example spring-quartz-schedule
for db level postgresql job scheduler
scheduler/cron it's best practices for such things

Count inserts, deletes and updates in a PowerCenter session

Is there a way in PowerCenter 9.1 to get the number of inserts, deletes and updates after an execution of a session? I can see the data on the log but I would like to see it in a more ordered fashion in a table.
The only way I know requires building the mapping appropriately. You need to have 3 separate instances of the target and use a router to redirect the rows to either TARGET_insert or TARGET_update or TARGET_delete. Workflow Monitor will then show a separate row for the inserted, updated and deleted rows.
There are few ways,
1. You can use $tgtsuccessrows / $TgtFailedRows and assign it to workflow variable
2. Expression transformation can be used with a variable port to keep track of insert/update/delete
3. You can even query OPB_SESSLOG in second stream to get row count inside same session.
Not sure if PowerCenter 9.1 offers a solution to this problem.
You can design your mapping to populate a Audit table to track the number of insert/update/delete's
You can download a sample implementation from Informatica Marketplace block titled "PC Mapping : Custom Audit Table"
https://community.informatica.com/solutions/mapping_custom_audit_table
There are multiple ways like you can create a assignment task attach this assignment task just after you session once the session complete its run the assignment task will pass on the session stats from session to the workflow variable defined at workflow level, sessions stats like $session.status,$session.rowcount etc and now create a worklet having a mapping included in it, pass the session stats captured at workflow level to the newly created worklet and from worklet to the mapping, now once the stats are available at mapping level in the mapping scan these stats (using a SQL or EXP transformation) and then write these stats to the AUDIT table ... attach the combination of assignment task and worklet after each session and it will start capturing the stats of each session after the session completes it run....

Auditing in Oracle

I need some help in auditing in Oracle. We have a database with many tables and we want to be able to audit every change made to any table in any field. So the things we want to have in this audit are:
user who modified
time of change occurred
old value and new value
so we started creating the trigger which was supposed to perform the audit for any table but then had issues...
As I mentioned before we have so many tables and we cannot go creating a trigger per each table. So the idea is creating a master trigger that can behaves dynamically for any table that fires the trigger. I was trying to do it but no lucky at all....it seems that Oracle restricts the trigger environment just for a table which is declared by code and not dynamically like we want to do.
Do you have any idea on how to do this or any other advice for solving this issue?
If you have 10g enterprise edition you should look at Oracle's Fine-Grained Auditing. It is definitely better than rolling your own.
But if you have a lesser version or for some reason FGA is not to your taste, here is how to do it. The key thing is: build a separate audit table for each application table.
I know this is not what you want to hear because it doesn't match the table structure you outlined above. But storing a row with OLD and NEW values for each column affected by an update is a really bad idea:
It doesn't scale ( a single update touching ten columns spawns ten inserts)
What about when you insert a record?
It is a complete pain to assemble the state of a record at any given time
So, have an audit table for each application table, with an identical structure. That means including the CHANGED_TIMESTAMP and CHANGED_USER on the application table, but that is not a bad thing.
Finally, and you know where this is leading, have a trigger on each table which inserts a whole record with just the :NEW values into the audit table. The trigger should fire on INSERT and UPDATE. This gives the complete history, it is easy enough to diff two versions of the record. For a DELETE you will insert an audit record with just the primary key populated and all other columns empty.
Your objection will be that you have too many tables and too many columns to implement all these objects. But it is simple enough to generate the table and trigger DDL statements from the data dictionary (user_tables, user_tab_columns).
You don't need write your own triggers.
Oracle ships with flexible and fine grained audit trail services. Have a look at this document (9i) as a starting point.
(Edit: Here's a link for 10g and 11g versions of the same document.)
You can audit so much that it can be like drinking from the firehose - and that can hurt the server performance at some point, or could leave you with so much audit information that you won't be able to extract meaningful information from it quickly, and/or you could end up eating up lots of disk space. Spend some time thinking about how much audit information you really need, and how long you might need to keep it around. To do so might require starting with a basic configuration, and then tailoring it down after you're able to get a sample of the kind of volume of audit trail data you're actually collecting.

Resources