Delete requests succeeded however record exist in AWS ElasticSearch/OpenSearch - elasticsearch

We have AWS ElasticSearch/OpenSearch where we have deleted some records as part of migration. As per logs and spot checking few records, we have successfully sent request to ElasticSearch/OpenSearch to delete record based on Id with Index name and index type by using JestClient execute API however record still exist which is creating an operational issue in the system. As per logs, the delete request did not create any exception and response indicated that delete request is successful (JestResult isSucceeded is true). Currently, we have operational issues due to these non deleted records from ES. Hence, we are deleting these manually whenever these gets surfaced by our users which is not efficient approach and causing operational pain.
Could someone provide inputs on what scenarios can cause this kind of issue in ElasticSearch/OpenSearch? How to debug this scenario and how to avoid this if anyone has faced this issue?

Related

Oracle ORDS: Get request returns old data, then after period of time the changed data

I am having a problem with the Oracle Rest Data Services (short ORDS) and I can't find a solution.
The Problem is as follows:
We are using ORDS via a TomCat Webserver and I have 2 Endpoints defined, one to Update a dataset and one to get all datasets from this table.
If I update the value via my Endpoint the change is written in the Table, but if I try to get the table with this change ORDS only response with the old not changed table. After a certain period of Time while constantly trying to get the change it repondes with the expected values. (happens after max 1 minute, can be earlier).
Because of this behaviour I accused some type of caching, but I cant find no configuration in the oracle database or on the TomCat.
Another Point for this theory was that I logged what happens in my GET procedure and found that only the one request with the correct values gets logged, like the others didnt even happen ..
The Request giving me the old value are coming back in the 4-8 ms range while the request with the correct data is in the 100-200 ms.
Ty for your help :)
I tried logging what happens, but I got that only the request with the fresh values was logged.
I tried to restart the TomCat Webserver to make sure that the cache is cleared, but this didnt fix the Problem
I searched for a configuration in ORDS or oracle where a cache would be defined, but it was never set.
I tried to set the value via a SQL update and not an endpoint, but even here I get the change only delayed
Do you have a full overview of the communication path? Maybe there is a proxy between?
When the TomCat has no caching configuration and you restartet the webserver during your tests and still have the same issue, then there is maybe more...
Kind regards
M-Achilles

middy-ssm not picking up changes to the lambda's execution role

We're using middy-ssm to fetch & cache SSM parameter values during lambda initialization. We ran into a situation where the execution role of the lambda did not have access to perform SSM::GetParameters on the path that it attempted to fetch. We updated a policy on the role to allow access, but it appeared like the lambda function never picked up the changes to permissions, but instead kept failing due to missing permissions until the end of the lifecycle (closer to 1 hour as requests kept on coming to it).
I then did a test where I fetched parameters using both the aws-lambda SDK directly and middy-ssm. Initially the lambda role didn't have permissions and both methods failed. We updated the policy and after a couple of minutes, the code that used the SDK was able to retrieve the parameter, but the middy middleware kept failing.
I tried to interpret the implementation of middy-ssm to figure out if the error result is somehow cached or what is going on there, but couldn't really pinpoint the issue. Any insight and/or suggestions how to overcome this are welcome! Thanks!
So as pointed out by Will in the comments, this turned out to be a bug.

Cannot delete Lambda#Edge created by Cloud Formation

I cannot delete a Lambda#Edge function create by Cloud Formation. During the Cloud Formation creation process an error occurred and the rollback process was executed. At the end we can't remove the Lambda created, we resolved the CF problem, renamed the resource and CF created a new Lambda. But the old one continues there. There aren't Cloud Front or another resource linked at the old Lambda and still we can't remove. When we try to remove we receive this message:
An error occurred when deleting your function: Lambda was unable to
delete
arn:aws:lambda:us-east-1:326353638202:function:web-comp-cloud-front-fn-prod:2
because it is a replicated function. Please see our documentation for
Deleting Lambda#Edge Functions and Replicas.
I know that if there aren't linked resources to Lambda#Edge after some minutes the replicas are deleted. But we can't find the linked resources.
Thank you in advance for your help.
I had a similar issue where I simply wasn't able to delete a Lambda#Edge, and the following helped,
Create a new Cloudfront distribution, and associate your Lambda#Edge with this new distribution.
Wait for the distribution to be fully deployed.
Remove the association of your Lambda#Edge from the Cloudfront distribution that you just created.
Wait for the distribution to be fully deployed.
Additionally, wait for a few more minutes.
Then, try to delete your Lambda#Edge.
The error message clearly indicates that the function still is replicated at the edge, which is the reason why you cannot delete it. So you first have to remove the lamda#edge association before deleting the function. If they are created in the same stack the easiest way is probably to set the lambda function's DeletionPolicy to Retain and to remove it manually afterwards.
Keep in mind that it can take up to a few hours before the replicas are deleted, not after some minutes. Usually, I just wait until the next day to remove them.

Terraform and OCI : "The existing Db System with ID <OCID> has a conflicting state of UPDATING" when creating multiple databases

I am trying to create 30 databases (oci_database_database resource) under 5 existing db_homes. All of these resources are under a single DB System :
When applying my code, a first database is successfully created then when terraform attempts to create the second one I get the following error message : "Error: Service error:IncorrectState. The existing Db System with ID has a conflicting state of UPDATING", which causes the execution to stop.
If I re-apply my code, the second database is created then I get the same previous error when terraform attempts to create the third one.
I am assuming I get this message because terraform starts creating the following database as soon as the first one is created, but the DB System status is not up to date yet (still 'UPDATING' instead of 'AVAILABLE').
A good way for the OCI provider to avoid this issue would be to consider a database creation as completed when the creation is indeed completed AND the associated db home and db system's status are back to 'AVAILABLE'.
Any suggestion on how to adress the issue I am encountering ?
Feel free to ask if you need any additional information.
Thank you.
As mentioned above, it looks like you have opened a ticket regarding this via github. What you are experiencing should not happen, as terraform should retry after seeing the error. As per your github post, the person helping you is in need of your log with timestamp so they can better troubleshoot. At this stage I would recommend following up there and sharing the requested info.

BulkDeleteFailureBase - lots of "Not Enough Privilege" records

I've taken over support of a CRM 2016 On-Premise system. I don't know the history of the particular instance, but I suspect it's been copied and/or imported many times.
The BulkDeleteFailureBase tables has just short of 2 million rows, almost all of which contain an error description like:
Not enough privilege to access the Microsoft Dynamics CRM object or
perform the requested operation. The current Organizationid '<GUID1>'
does not match with userOrTeam's organization id '<GUID2>'.
OrganisationBase has only one record with <GUID2> in it.
Has this happened because the instance has been copied/moved around incorrectly? If so, is this likely an indication more problems are heading my way in the future?
How can I recover from this?
BulkDeleteFailureBase is one of the system async jobs logging table where platform captures the run/success/failure logs.
Probably someone might have tried to clean the data like Plugin Trace log which were copied over from different DB backup/restore or CRM Org restoration. They used Bulk delete & all that fails, ended up here.
MS Support recommendation gives the script to clean those tables safely. Leaving it only gives you performance head-ache.

Resources