the issue is in our kibana monitors are getting lost by themselves and after 2-3 mins they come back
[
Normally alerts were being created as expected
Second and the most important one is alerts are not triggering even if the condition result is true.
Condition Response true!!
So anyone faced this issue, i'm open to advices. Thanks a lot since now.
Related
I use that enhanced auto-remediation (https://github.com/PaloAltoNetworks/Prisma-Enhanced-Remediation#getting-started) trying to auto remediate alerts detected in Prisma.
For some reasons some alerts that can not be remediated due to lack of permissions, errors or just deficiency in runbook or any others, constantly trigger associated runbooks in lambda.
I noticed that situation with constantly triggering alert happens when, first time alert is triggered and it can't be fixed due to lack of permissions or just runbook runs correctly but in fact it doesn't fix issue, it triggers lambda(runbook) for some period of time (it looks it is related to parameter Message retention period in SQS) and every 30 minutes (it looks it is related to parameter visibility timeout in SQS ), no matter it is fixed (manually or via improved runbook) or not.
Once alert comes in (first time) and is fixed immediately there are no more triggering as i described as root cause.
I suspect that in second scenario runbook returns something it allows remove that alert from queue. How to handle first scenario ?
We have a subscription to an RxJS Observable that's obtained from the Sanity javascript client's listen method.
This works fine except that every now and then we get an error "The operation timed out". I haven't been able to pinpoint exactly when and where this arises but I suspect it happens after a certain timeout without the subscription receiving any message. This does not, however, indicate any issue in our case.
I'm not well versed in observables; is there something basic I'm missing, or has anyone had a similar issue?
Listeners are currently automatically closed after 5 minutes. This might be what you're encountering.
It's actually a regression that we discovered recently; listeners are supposed to time out only after 30 minutes. We are expecting a fix for it this week. Edit: The fix has now been released.
It's important for a client to be resilient against any kind of error, though. On the Internet, network timeouts and other glithces are of course very common and must be handled appropriately. eventually the listener will close itself, as this is the intended behaviour.
(I'm a developer at Sanity.)
I am trying to monitor and get alert when my instance shut down. For this I have configured Alert policy in stackdriver as below:
Metric Absence Condition
Violates when: CPU Usage (GCE Monitoring) is absent for greater than 5 minutes
It worked only for first time and then never created any incident for any of stopped instances.
What am I missing here?
Expected Behavior
#Jyotsna. I investigated this issue a bit more and was able to confirm that this is currently an expected behavior, as alert conditions aren't triggered by inactive instances which explains why, you won't get any alerts when the VM instance CPU is not registering any metrics. However, there's currently a Feature Request in progress to update this behavior.
Known Issue
There's also a known issue which seems to have caused the behavior of no logs being sent to Stackdriver on subsequent violations of the policy even after the offending VM is back online. This explains why,
It worked only for first time and then never created any incident for any of stopped instances.
Hence stopped instances won't work properly until the issue is fixed. Unfortunately, there's no ETA. on this, but eventually, it will be addressed.
I have been experiencing an issue where occasionally my Kibana stops working stating a time-out trying to connect to elasticsearch as the cause. (I have marvel installed. Something like: "plugin:elasticsearch Request Timeout"
Usually these go away by the next day, and occasionally I have been able to re-gain access to my data by increasing the timeout on kibana. However I can't figure out how to troubleshoot this issue. I suspect it may be that ES is storing some extremely large individual documents but I cannot find them, there's just too many logs to dig through by hand.
My elasticsearch cluster is perfectly healthy (green on health check), even when kibana cannot access it.
Where can I possibly start to try and troubleshoot why we are getting timeouts here? when I expand the timeout window, kibana comes back, and everything works FINE.
Any tips on where to start searching would be enormously appreciated!!
Was few days away from the computers and when back to work I found this very strange activity on my ape requests graph.
I have nothing running whole weekend.
Looks like some think monitoring each 3 seconds.
Do you know what it can be or what to check?
You can go to your Admin console and from there check on the logs for your app
(under "Monitoring - > Logs"). This will tell you what was the request that caused a request to appear. Without access to your logs it's the best I can offer from here.
and 0.033 request a second is every 30 seconds, not every 3.