Icinga2 notification just once on state change - icinga2

I have set up icinga2 to monitor a few services with different intervals, so one service might be checked every 10 seconds. If it gives a critical error I will receive a notification, but I will receive it every 10 seconds if the error persists, or until I acknowledge it. I just want to receive it once for each state change. Then maybe after a specified time again, but it is not that important.
Here is my config:
This is more or less the standard template.conf, but I have added the "interval=0s", because I read that it should prevent notifications from being sent multiple times.
template Notification "mail-service-notification" {
command = "mail-service-notification"
interval = 0s
states = [ OK, Critical ]
types = [ Problem, Acknowledgement, Recovery, Custom,
FlappingStart, FlappingEnd,
DowntimeStart, DowntimeEnd, DowntimeRemoved ]
vars += {
notification_logtosyslog = false
}
period = "24x7"
}
And here is the part of the notification.conf that includes the template:
object NotificationCommand "telegram-service-notification" {
import "plugin-notification-command"
command = [ SysconfDir + "/icinga2/scripts/telegram-service-notification.sh" ]
env = {
NOTIFICATIONTYPE = "$notification.type$"
SERVICEDESC = "$service.name$"
HOSTNAME = "$host.name$"
HOSTALIAS = "$host.display_name$"
HOSTADDRESS = "$address$"
SERVICESTATE = "$service.state$"
LONGDATETIME = "$icinga.long_date_time$"
SERVICEOUTPUT = "$service.output$"
NOTIFICATIONAUTHORNAME = "$notification.author$"
NOTIFICATIONCOMMENT = "$notification.comment$"
HOSTDISPLAYNAME = "$host.display_name$"
SERVICEDISPLAYNAME = "$service.display_name$"
TELEGRAM_BOT_TOKEN = TelegramBotToken
TELEGRAM_CHAT_ID = "$user.vars.telegram_chat_id$"
}
}
apply Notification "telegram-icingaadmin" to Service {
import "mail-service-notification"
command = "telegram-service-notification"
user_groups = [ "icingaadmins" ]
assign where host.name
}

I think you had a typo.
It should work if you set interval = 0 (not "interval = 0s")
After that change you must restart the icinga service.

Related

Terraform starting EC2 sometimes stuck on "Still creating" until timeout

I am running a terraform through Jenkins which starts up an ec2 then runs a shell script on it using user_data. I run this job 23 times in parallel, and for some reason each time only a few of them (anywhere from 1 to 8 and always different indices) will hang on "aws_instance.genomic-etl-ec2: Still creating..." until the connection times out after approximately an hour and throws a RequestExpired error, with no further details on why. The other instances start fine within around 2-3 minutes each.
My resource:
data "template_file" "my-user_data" {
template = file("scripts/my_script.sh")
}
data "template_cloudinit_config" "my-user-data" {
gzip = true
base64_encode = true
# user_data
part {
content_type = "text/x-shellscript"
content = data.template_file.my-user_data.rendered
}
}
resource "aws_instance" "genomic-etl-ec2" {
ami = var.ami-id
instance_type = "m5.12xlarge"
associate_public_ip_address = true
subnet_id = var.my-subnet-us-east-id
iam_instance_profile = "my-deployment-profile"
user_data = data.template_cloudinit_config.my-user-data.rendered
vpc_security_group_ids = [
aws_security_group.my-sg1.id,
aws_security_group.my-sg2.id
]
root_block_device {
delete_on_termination = true
encrypted = true
volume_size = 1000
}
provisioner "local-exec" {
command = "sleep 40"
}
tags = {
Owner = "Me"
Environment = "development"
Name = "My EC2 - ${id}"
automaticPatches = "1"
}
}
Sometimes AWS instances take a long time to become fully available. It's not uncommon for those to take longer than Terraform's default timeout, causing Terraform to fail.
As per the official documentation on the Terraform aws_instance resource, the create timeout defaults to 10 minutes. If a particular instance type is taking longer than 10 minutes to become available, then you need to increase the create timeout setting:
resource "aws_instance" "genomic-etl-ec2" {
# ...
timeouts {
create = "20m"
}
}

Triggering a Lambda once a DMS Replication Task has completed in Terraform

I would like to trigger a Lambda once an RDS Replication Task has successfully completed. I have the following Terraform code, which successfully creates all the assets, but my Lambda is not being triggered.
resource "aws_dms_event_subscription" "my_event_subscription" {
enabled = true
event_categories = ["state change"]
name = "my-event-subscription"
sns_topic_arn = aws_sns_topic.my_event_subscription_topic.arn
source_ids = ["my-replication-task"]
source_type = "replication-task"
}
resource "aws_sns_topic" "my_event_subscription_topic" {
name = "my-event-subscription-topic"
}
resource "aws_sns_topic_subscription" "my_event_subscription_topic_subscription" {
topic_arn = aws_sns_topic.my_event_subscription_topic.arn
protocol = "lambda"
endpoint = aws_lambda_function.my_lambda_function.arn
}
resource "aws_sns_topic_policy" "allow_publish" {
arn = aws_sns_topic.my_event_subscription_topic.arn
policy = data.aws_iam_policy_document.allow_dms_and_events_document.json
}
resource "aws_lambda_permission" "allow_sns_invoke" {
statement_id = "AllowExecutionFromSNS"
action = "lambda:InvokeFunction"
function_name = aws_lambda_function.my_lambda_function.function_name
principal = "sns.amazonaws.com"
source_arn = aws_sns_topic.my_event_subscription_topic.arn
}
data "aws_iam_policy_document" "allow_dms_and_events_document" {
statement {
actions = ["SNS:Publish"]
principals {
identifiers = [
"dms.amazonaws.com",
"events.amazonaws.com"
]
type = "Service"
}
resources = [aws_sns_topic.my_event_subscription_topic.arn]
}
}
Am I missing something?
Is event_categories = ["state change"] correct? (This suggests state change is correct.
I'm less concerned right now if the Lambda is triggered for every state change, and not just DMS-EVENT-0079.)
Is there something I can add to get CloudWatch logs from the event subscription, to tell me what's wrong?
You can try giving it a JSON as shared on AWS Documentation.
{
"version":"0",
"id":"11a11b11-222b-333a-44d4-01234a5b67890",
"detail-type":"DMS Replication Task State Change",
"source":"aws.dms",
"account":"0123456789012",
"time":"1970-01-01T00:00:00Z",
"region":"us-east-1",
"resources":[
"arn:aws:dms:us-east-1:012345678901:task:AAAABBBB0CCCCDDDDEEEEE1FFFF2GGG3FFFFFF3"
],
"detail":{
"type":"ReplicationTask",
"category":"StateChange",
"eventType":"REPLICATION_TASK_STARTED",
"eventName":"DMS-EVENT-0069",
"resourceLink":"https://console.aws.amazon.com/dms/v2/home?region=us-east-1#taskDetails/taskName",
"detailMessage":"Replication task started, with flag = fresh start"
}
}
You can check how to give this as JSON in Terraform here

Messages are not moved to DLQ

I'm using ElasticMQ (via docker image v1.3.3) and can't get the DLQ to work.
This is my elasticmq.conf:
include classpath("application.conf")
node-address {
protocol = http
host = localhost
port = 9324
context-path = ""
}
rest-sqs {
enabled = true
bind-port = 9324
bind-hostname = "0.0.0.0"
// Possible values: relaxed, strict
sqs-limits = strict
}
rest-stats {
enabled = true
bind-port = 9325
bind-hostname = "0.0.0.0"
}
queues {
main {
defaultVisibilityTimeout = 10 seconds
delay = 2 seconds
receiveMessageWait = 0 seconds
deadLettersQueue {
name = "retry"
maxReceiveCount = 1
}
}
retry {
defaultVisibilityTimeout = 10 seconds
delay = 2 seconds
receiveMessageWait = 0 seconds
}
deadletter {
defaultVisibilityTimeout = 10 seconds
delay = 2 seconds
receiveMessageWait = 0 seconds
}
}
I'm sending a message like so (using the AWS CLI):
aws --endpoint-url http://localhost:9324 sqs send-message --queue-url http://localhost:9324/queue/main --message-body "Hello, queue"
And receiving it like so:
aws --endpoint-url http://localhost:9324 sqs receive-message --queue-url http://localhost:9324/queue/main --wait-time-seconds 10
I'm not deleting the message from the queue but the message is being deleted and not being moved to the DLQ (i.e., the retry queue). I'm also trying to receive the message with Java code and getting the same result.
Why is that?

Rocketchat integration with AWX Tower notification

I'm looking for a way how to integrate a notification for Ansible Tower / AWX to Rocket.Chat? I can't find a suitable script for Rocket.Chat integration.
First go in Rocket.Chat in Administration > Integration and then create a new incoming webhook. Configure it as wanted (name, bot, channel, etc.) enable scripting and add the following script:
class Script {
process_incoming_request({ request }) {
// UNCOMMENT THE BELOW LINE TO DEBUG IF NEEDED.
// console.log(request.content);
let body = request.content.body;
if (!body) {
let id = request.content.id;
let name = request.content.name;
let url = request.content.url;
let status = request.content.status;
let type = request.content.friendly_name;
let project = request.content.project;
let playbook = request.content.playbook;
let hosts = request.content.hosts;
let created_by = request.content.created_by;
let started = request.content.started;
let finished = request.content.finished;
let traceback = request.content.traceback;
let inventory = request.content.inventory;
let credential = request.content.credential;
let limit = request.content.limit;
let extra_vars = request.content.extra_vars;
let message = "";
message += "AWX "+type+" "+name+" ("+id+") ";
message += "on project _"+project+"_ ";
message += "running playbook _"+playbook+"_ ";
message += "has status *"+status+"*.";
message += "\n";
message += type+" was created by _"+created_by+"_ for inventory _"+inventory+"_ ";
if (limit !== "") {
message += "with limit _"+limit+"_ ";
}
message += " and using the _"+credential+"_ credentials.\n";
if (Object.keys(hosts).length != 0) {
message += "Hosts: "+Object.keys(hosts).length+" (ok/changed/skipped/failures)\n";
for (let [name, host] of Object.entries(hosts)) {
message += "- "+name+" ("+host.ok+"/"+host.changed+"/"+host.skipped+"/"+host.failures+")";
if (host.failed === false) {
message += " is *ok*\n";
} else {
message += " has *failed*\n";
}
}
}
return {
content: {
"text": "AWX notification *"+status+"* on "+type+" "+name+" ("+id+")",
"attachments": [
{
"title": type+": "+name+"",
"title_link": url,
"text": message,
"color": "#764FA5"
}
]
}
};
} else {
return {
content: {
text: "AWX notification: " + request.content.body
}
};
}
}
}
Save and activate the webhook. Now you get a Webhook URL from Rocket.Chat. Copy that URL.
Go to your AWX instance and create a new Notification of type Webhook and paste the Webhook URL from Rocket.Chat. You can test the notifcation within AWX.
The script does not print extra vars, because they could contain passwords etc. But you'll see failed hosts and some more information about the job.
AWX/Tower has the ability to send notifications to rocket.chat without any custom scripts.
In Tower go to Notifications and add a new one with type 'Rocket.Chat' then set the Target URL to be the URL of a blank incoming webhook in Rocket.Chat (Make sure it's enabled at the top).
(Note: Be careful of the URL Rocket.Chat gives you for the integration, mine didn't give me a URL with the correct port of 3000 within the URL so it failed at first)
Heres what the notifcations read as:
Bot -
3:13 PM
Tower Notification Test 1 https://ruupansi01
Bot -
3:15 PM
Project Update #2 'Test Project' succeeded: https://tower/#/jobs/project/1

Akka.Net Clustering Simple Explanation

I try to do a simple cluster using akka.net.
The goal is to have a server receiving request and akka.net processing it through it cluster.
For testing and learning I create a simple WCF service that receive a math equation and I want to send this equation to be solved.
I have one project server and another client.
The configuration on the server side is :
<![CDATA[
akka {
actor {
provider = "Akka.Cluster.ClusterActorRefProvider, Akka.Cluster"
debug {
receive = on
autoreceive = on
lifecycle = on
event-stream = on
unhandled = on
}
deployment {
/math {
router = consistent-hashing-group #round-robin-pool # routing strategy
routees.paths = [ "/user/math" ]
virtual-nodes-factor = 8
#nr-of-instances = 10 # max number of total routees
cluster {
enabled = on
max-nr-of-instances-per-node = 2
allow-local-routees = off
use-role = math
}
}
}
}
remote {
helios.tcp {
transport-class = "Akka.Remote.Transport.Helios.HeliosTcpTransport, Akka.Remote"
applied-adapters = []
transport-protocol = tcp
port = 8081
hostname = "127.0.0.1"
}
}
cluster {
seed-nodes = ["akka.tcp://ClusterSystem#127.0.0.1:8081"] # address of seed node
}
}
]]>
On the Client side the configuration is like this :
<![CDATA[
akka {
actor.provider = "Akka.Cluster.ClusterActorRefProvider, Akka.Cluster"
remote {
log-remote-lifecycle-events = DEBUG
log-received-messages = on
helios.tcp {
transport-class = "Akka.Remote.Transport.Helios.HeliosTcpTransport, Akka.Remote"
applied-adapters = []
transport-protocol = tcp
port = 0
hostname = 127.0.0.1
}
}
cluster {
seed-nodes = ["akka.tcp://ClusterSystem#127.0.0.1:8081"] # address of the seed node
roles = ["math"] # roles this member is in
}
actor.deployment {
/math {
router = round-robin-pool # routing strategy
routees.paths = ["/user/math"]
nr-of-instances = 10 # max number of total routees
cluster {
enabled = on
allow-local-routees = on
use-role = math
max-nr-of-instances-per-node = 10
}
}
}
}
]]>
The cluster connection seems to correctly be made. I see the status [UP] and the association with the role "math" that appeared on the server side.
Event follwing the example on the WebCramler, I don't achieved to make a message to be delivered. I always get a deadletters.
I try like this :
actor = sys.ActorOf(Props.Empty.WithRouter(FromConfig.Instance), "math");
or
var actor = sys.ActorSelection("/user/math");
Does someone know a good tutorial or could help me ?
Thanks
Some remarks:
First: assuming your sending work from the server to the client. Then you are effectively remote deploying actors on your client.
Which means only the server node needs the actor.deployment config section.
The client only needs the default cluster config (and your role setting ofcourse).
Second: Try to make it simpler first. Use a round-robin-pool instead. Its much simpler. Try to get that working. And work your way up from there.
This way its easier to eliminate configuration/network/other issues.
Your usage: actor = sys.ActorOf(Props.Empty.WithRouter(FromConfig.Instance), "math"); is correct.
A sample of how your round-robin-pool config could look:
deployment {
/math {
router = round-robin-pool # routing strategy
nr-of-instances = 10 # max number of total routees
cluster {
enabled = on
max-nr-of-instances-per-node = 2
allow-local-routees = off
use-role = math
}
}
}
Try this out. And let me know if that helps.
Edit:
Ok after looking at your sample. Some things i changed
ActorManager->Process: Your creating a new router actor per request. Don't do that. Create the router actor once, and reuse the IActorRef.
Got rid of the minimal cluster size settings in the MathAgentWorker project
Since your not using remote actor deployment. I changed the round-robin-pool to a round-robin-group.
After that it worked.
Also remember that if your using the consistent-hashing-group router you need to specify the hashing key. There are various ways to do that, in your sample i think the easiest way would be to wrap the message your sending to your router in a ConsistentHashableEnvelope. Check the docs for more information.
Finally the akka deployment sections looked like this:
deployment {
/math {
router = round-robin-group # routing strategy
routees.paths = ["/user/math"]
cluster {
enabled = on
allow-local-routees = off
use-role = math
}
}
}
on the MathAgentWorker i only changed the cluster section which now looks like this:
cluster {
seed-nodes = ["akka.tcp://ClusterSystem#127.0.0.1:8081"] # address of the seed node
roles = ["math"] # roles this member is in
}
And the only thing that the ActorManager.Process does is:
return await Program.Instance.RouterInstance.Ask<TResult>(msg, TimeSpan.FromSeconds(10));

Resources