Are increases to log retention retroactive? - google-cloud-logging

re: https://cloud.google.com/logging/docs/storage#logs-retention
If I have a 30 day retention policy, then increase that policy to 90 days, do logs that are 30 days old stick around for another 60 days?

A retention policy retroactively applies to existing objects in the bucket as well as new objects added to the bucket.Please see the document 1 for more details.
You can list 2 your logging buckets and look for the retention period for the _Default bucket:
$ gcloud beta logging buckets list
You can also set it to a different amount of retention days 3 with:
$ gcloud beta logging buckets update _Default --location=global --retention-days=[RETENTION_DAYS]

Log retention changes are retroactive.
Just finished testing this out by adding a log message to a logs bucket with single-day retention, verifying that it was deleted after a day. Added a new log message, then increased retention to a week -- two days later and the log remains.

Related

Best practices for expiring ES data based a dynamic retention period

I'm not sure if that title makes much sense.
Right now, I have a fair amount of data coming in through logstash - about 7-10GB/day, and it all needs to stick around for 60 days. I currently write it to an index ("index-20220718") for example, based on the current date, and just delete any index older than 60 days. That's easy.
But things are changing.
Soon I'm going to have data coming in that will have different, dynamic expiration dates. Some records need to stick around 15 days, some 30 days, some 365 days, some 3650 days. The retention period is in a field that's in the data.
So what's the best way to index this? I thought of using date math, adding the number of days in the retention field to the current date, and storing it in an index like "index-20220802" if it had a 30 day retention period, and then deleting any index that's dated before today.
Is this the best way to do it? Is it going to complicate searches? I'm just the sysadmin setting up the basic logstash/ES configuration - I'm not any sort of expert on ES or programming.
If a customer changes the retention period for their account, I guess we'd have to go through and re-index everyone of their documents?
I feel like I must be missing other problems with this method too.
Is there a better way to do this that I'm just not seeing?
Thanks-

Auto record delete after 10 minutes in Postgresql and using Springboot

We using SpringBoot and SpringData with Postgresql, I want to auto-delete the newly inserted record in the table after 10 minutes.
Every new record will persist only in 10 minutes after 10 minutes it should be auto delete.
Any solution for Auto delete record?
Note: I don't want any Scheduler to do this job, please provide any other solution.
Thanks
Yasir
The proper solution is not to delete the entries, but add a creation timestamp and only select those rows that are no older than 10 minutes.
You can purge old records regularly, say every day, either by deleting them or (if you use partitioning) by dropping partitions.

Is it possible to read actual applied immutable policy on a blob - remaining Time-based retention in days

I have applied a Time-based retention e.g. 5 days. Is it possible to read on each blob / their metadata, when this policy is expiring...
Same way as we can read on soft deleted blobs RemainingRetentionDays??
For Time-based immutable policy, there is no direct way to know when this policy is expiring. You need to calculate it by yourself. For example, if the Time-based retention is 5-days, then you upload a blob in January 1. Then in January 2, it has 4-days to be expired.
For soft deleted blobs, yes, you can read the RemainingRetentionDays by using code or using UI.
Here is an example to read this value via UI, you should install Azure Storage Explorer. Then you can see the RemainingRetentionDays for each deleted blobs(Note: due to this bug, you should turn off versioning feature.):

Kafka Streams KGroupedTable recovery

Suppose an aggregation like:
stream.groupByKey
.count()
.toStream
.to(topic)
What happens after the default broker retention time e.g. 1 week has passed and the local state-store of the count operation has to be recovered? Will it loose the counts of those keys removed by retention?
I think I missed the point that the changelog topic for count gets the configuration "cleanup.policy"="compact" which implicitly sets the retention to infinity. Therefore no keys will be removed due to retention.

Two insert sessions taking different time in production and uat in informatica

I have a same session running in production and UAT.All it does is seslects the data ( around 6k in both environments).Expression transformation (to hard code few columns) and then inserting into a table ( which does not have partitions).
The problem I am facing is PROD session is taking more than 30 minutes where as UAT is done within 5 minutes.
I have backtracked the timining to many days and its following the same pattern.When compared the session properties between the two.There is no difference at all.
When checked the session log its the reading of rows which is taking time(same count and query in UAT also)Could you please let me know how to proceed with this:
PROD:
Severity Timestamp Node Thread Message Code Message
INFO 4/26/2016 11:07:18 AM node02_WPPWM02A0004 WRITER_1_*_1 WRT_8167 Start loading table [FACT_] at: Tue Apr 26 01:37:17 2016
INFO 4/26/2016 11:26:48 AM node02_WPPWM02A0004 READER_1_1_1 BLKR_16019 Read [6102] rows, read [0] error rows for source table [STG_] instance name [STG]
UAT:
Severity Timestamp Node Thread Message Code Message
INFO 4/26/2016 11:40:53 AM node02_WUPWM02A0004 WRITER_1_*_1 WRT_8167 Start loading table [FACT] at: Tue Apr 26 01:10:53 2016
INFO 4/26/2016 11:43:10 AM node02_WUPWM02A0004 READER_1_1_1 BLKR_16019 Read [6209] rows, read [0] error rows for source table [STG] instance name [STG]
Follow the below steps
1) Open the session log and search for 'Busy'
2) Find the Busy statistics which has a very high Busy Percentage
3) if it is with reader , just run the query in production and UA and try to check the retrieval time. If its high in production then there is a need to tune the query or create indexes or create partitions at table level and informatica level etc., (depend on your project limitations)
4) if it is writer try to increase few informatica options like 'Maximum memory allocated for auto memory attributes' and 'Maximum percentage of total memory allowed..." depending on your server configuration
5) Also try to use informatica partitions while loading into target (Provided the target is partitioned on a particular column)
6) Also some times there is a possibility that cache creation takes time due to huge tables being used as lookups( Refer busy percentage of lookup also). In that case also target waits for the rows to come to the writer thread as they are still transforming
we need to tune the lookup by overriding the default query with tuned version of query
Also search for the following keywords
"Timeout based Commit point" - generally occurs when a writer thread waits for long time
"No more lookup cache " - generally occurs whenever there is a huge data and index cache to be build and no space available on the disk as multiple jobs will be running in production utilizing the same cache folder
Thanks and Regards
Raj
Perhaps, you should check the Query's Explain plan in UAT and PROD. Working on the plan in PROD can help. Similar thing happen with me earlier. We checked the SQL plan and found that it is different in prod when compared to UAT. Had to work with the DBA's to change the plan.

Resources