Does AWS SSM agent cloudwatch resource:memory event? - amazon-ec2

Am picking ECS optimised instance(ami-05958d7635caa4d04) in data plane of ECS in ca-central-1 region.
AWS Systems Manager Agent (SSM Agent) is Amazon software that can be installed and configured on an Amazon EC2 instance, an on-premises server, or a virtual machine (VM). SSM Agent makes it possible for Systems Manager to update, manage, and configure these resources.
In my scenario, Launching a ECS task in ECS optimised instance(ami-05958d7635caa4d04), causes resource:memory error. More on this error, here. Monitoring ECS->cluster->service->events will not work for me, because cloudformation roll back the cluster.
My existing ECS optimised instance is launched as shown below:
"EC2Instance":{
"Type": "AWS::EC2::Instance",
"Properties":{
"ImageId": "ami-05958d7635caa4d04",
"InstanceType": "t2.micro",
"SubnetId": { "Ref": "SubnetId"},
"KeyName": { "Ref": "KeyName"},
"SecurityGroupIds": [ { "Ref": "EC2InstanceSecurityGroup"} ],
"IamInstanceProfile": { "Ref" : "EC2InstanceProfile"},
"UserData":{
"Fn::Base64": { "Fn::Join": ["", [
"#!/bin/bash\n",
"echo ECS_CLUSTER=", { "Ref": "EcsCluster" }, " >> /etc/ecs/ecs.config\n",
"groupadd -g 1000 jenkins\n",
"useradd -u 1000 -g jenkins jenkins\n",
"mkdir -p /ecs/jenkins_home\n",
"chown -R jenkins:jenkins /ecs/jenkins_home\n"
] ] }
},
"Tags": [ { "Key": "Name", "Value": { "Fn::Join": ["", [ { "Ref": "AWS::StackName"}, "-instance" ] ]} }]
}
}
1) Does aws ssm agent installation required on ECS instance(ami-05958d7635caa4d04) to retrieve such cloudwatch events(resource:memory) with aws.ssm cloudwatch event rule filter? or Does aws.ec2 cloudwatch event rule filter suffice?
2) If yes, Do I need to explicitly install SSM agent on ECS instance(ami-05958d7635caa4d04)? through CloudFormation...

You don't need to install SSM agent to monitor something such as memory usage of your instance (whether container instance or not). This is domain of CloudWatch, not SSM.
All you need to install is unified cloud watch agent and configure it accordingly. This is where SSM can help but it is not necessary and you can install it manually (or via script if you want).
If you decide to use SSM then you will need to explicitly install it. It comes preinstalled on some OSes but not on Amazon ECS-Optimized AMI - more about this.

Related

unable to control swarm ingress network with ansible

I'm deploying Docker swarm with ansible and I would like to ensure the ingress network has been created. In that aim, I configured the following task :
- name: Ensure ingress network exists
docker_network:
state: present
name: ingress
driver: overlay
driver_options:
ingress: true
And I'm getting the following error :
An exception occurred during task execution. To see the full traceback, use -vvv. The error was: docker.errors.NotFound: 404 Client Error for http+docker://localhost/v1.41/networks/ingress/disconnect: Not Found ("No such container: ingress-endpoint")
fatal: [swarm-srv-1]: FAILED! => {"changed": false, "msg": "An unexpected docker error occurred: 404 Client Error for http+docker://localhost/v1.41/networks/ingress/disconnect: Not Found (\"No such container: ingress-endpoint\")"}
I've tried to add some arguments likes :
scope: swarm
force: yes
But no changes... I've also tried to delete the ingress with ansible (state: absent), but I always get the same error.
Note that I don't face any issue when trying to delete a recreate the ingress network manually on the swarm : docker network rm ingress
I don't know how to resolve that issue...Any help would be appreciated. Thanks !
Here are some informations that may help...
# docker version
Version: 20.10.6
API version: 1.41
Go version: go1.13.15
Git commit: 370c289
Built: Fri Apr 9 22:47:35 2021
OS/Arch: linux/amd64
# docker inspect ingress
[
{
"Name": "ingress",
"Id": "yb2tkhep8vtaj9q7w3mssc9lx",
"Created": "2021-05-19T05:53:27.524446929-04:00",
"Scope": "swarm",
"Driver": "overlay",
"EnableIPv6": false,
"IPAM": {
"Driver": "default",
"Options": null,
"Config": [
{
"Subnet": "10.0.0.0/24",
"Gateway": "10.0.0.1"
}
]
},
"Internal": false,
"Attachable": false,
"Ingress": true,
"ConfigFrom": {
"Network": ""
},
"ConfigOnly": false,
"Containers": {
"ingress-sbox": {
"Name": "ingress-endpoint",
"EndpointID": "dfdc0f123d21a196c7a815c7e0a886924d0799ae5f3be2d38b64d527ed4620b1",
"MacAddress": "02:42:0a:00:00:02",
"IPv4Address": "10.0.0.2/24",
"IPv6Address": ""
}
},
"Options": {
"com.docker.network.driver.overlay.vxlanid_list": "4096"
},
"Labels": {},
"Peers": [
{
"Name": "8f8932d6f99f",
"IP": "(ip address here)"
},
{
"Name": "28b9ca95dcf0",
"IP": "(ip address here)"
},
{
"Name": "f7c48c8af2f5",
"IP": "(ip address here)"
}
]
}
]
I had the exact same issue when trying to customize the IP range of the ingress network. It looks like the docker_network module does not support modification of swarm specific networks: there is a open Github issue for this.
I went for the ugly workaround of removing the network by executing it through a shell (docker network rm ingress command) and adding it again. When adding it with the docker_network module, I found that adding also seems not be working (fails to set the ingress property of the network). So I ended up doing both remove- and create operation through a shell command.
Since the removal will trigger a confirmation dialogue:
WARNING! Before removing the routing-mesh network, make sure all the nodes in your swarm run the same docker engine version. Otherwise, removal may not be effective and functionality of newly create ingress networks will be impaired.
Are you sure you want to continue? [y/N]
I used the expect module to confirm the dialogue:
- name: remove default ingress network
ansible.builtin.expect:
command: docker network rm ingress
responses:
"[y/N]": "y"
- name: create customized ingress network
shell: "docker network create --ingress --subnet {{ docker_ingress_network }} --driver overlay ingress"
It is not perfect but it works.
There was one last problem I experienced: when running it on an existing swarm I ended up having network issues on the node where I did run this (somehow the docker_gwbridge network on that node could not handle the change). The fix for this was to fully remove the node and re-join the swarm (regenerates the docker_gwbridge).

Traefik with dynamic routing to ECS backends, running as one-off tasks

I'm triying to implement solution for reverse-proxy service using Traefik v1 (1.7) and ECS one-off tasks as backends, as described in this SO question. Routing should by dynamic - requests to /user/1234/* path should go to the ECS task, running with the appropriate docker labels:
docker_labels = {
traefik.frontend.rule = "Path:/user/1234"
traefik.backend = "trax1"
traefik.enable = "true"
}
So far this setup works fine, but I need create one ECS task definition per one running task, because the docker labels are the property of ECS TaskDefinition, not the ECS task itself. Is it possible to create only one TaskDefinition and pass Traefik rules in ECS task tags, within task key/value properties?
This will require some modification in Traefik source code, are the any other available options or ways this should be implemented, that I've missed, like API Gateway or Lambda#Edge? I have no experience with those technologies, real-world examples are more then welcome.
Solved by using Traefik REST API provider. External component, which runs the one-off tasks, can discover task internal IP and update Traefik configuration on-fly by pair traefik.frontend.rule = "Path:/user/1234" and task internal IP:port values in backends section
It should GET the Traefik configuration first from /api/providers/rest endpoint, remove or add corresponding part (if task was stopped or started), and update Traefik configuration by PUT method to the same endpoint.
{
"backends": {
"backend-serv1": {
"servers": {
"server-service-serv-test1-serv-test-4ca02d28c79b": {
"url": "http://172.16.0.5:32793"
}
}
},
"backend-serv2": {
"servers": {
"server-service-serv-test2-serv-test-279c0ba1959b": {
"url": "http://172.16.0.5:32792"
}
}
}
},
"frontends": {
"frontend-serv1": {
"entryPoints": [
"http"
],
"backend": "backend-serv1",
"routes": {
"route-frontend-serv1": {
"rule": "Path:/user/1234"
}
}
},
"frontend-serv2": {
"entryPoints": [
"http"
],
"backend": "backend-serv2",
"routes": {
"route-frontend-serv2": {
"rule": "Path:/user/5678"
}
}
}
}
}

Querying remote registry service on machine <IP Address> resulted in exception: Unable to change open service manager

My cluster Config file as follows
`
{
"name": "SampleCluster",
"clusterConfigurationVersion": "1.0.0",
"apiVersion": "01-2017",
"nodes":
[
{
"nodeName": "vm0",
"iPAddress": "here is my VPS ip",
"nodeTypeRef": "NodeType0",
"faultDomain": "fd:/dc1/r0",
"upgradeDomain": "UD0"
},
{
"nodeName": "vm1",
"iPAddress": "here is my another VPS ip",
"nodeTypeRef": "NodeType0",
"faultDomain": "fd:/dc1/r1",
"upgradeDomain": "UD1"
},
{
"nodeName": "vm2",
"iPAddress": "here is my another VPS ip",
"nodeTypeRef": "NodeType0",
"faultDomain": "fd:/dc1/r2",
"upgradeDomain": "UD2"
}
],
"properties": {
"reliabilityLevel": "Bronze",
"diagnosticsStore":
{
"metadata": "Please replace the diagnostics file share with an actual file share accessible from all cluster machines.",
"dataDeletionAgeInDays": "7",
"storeType": "FileShare",
"IsEncrypted": "false",
"connectionstring": "c:\\ProgramData\\SF\\DiagnosticsStore"
},
"nodeTypes": [
{
"name": "NodeType0",
"clientConnectionEndpointPort": "19000",
"clusterConnectionEndpointPort": "19001",
"leaseDriverEndpointPort": "19002",
"serviceConnectionEndpointPort": "19003",
"httpGatewayEndpointPort": "19080",
"reverseProxyEndpointPort": "19081",
"applicationPorts": {
"startPort": "20001",
"endPort": "20031"
},
"isPrimary": true
}
],
"fabricSettings": [
{
"name": "Setup",
"parameters": [
{
"name": "FabricDataRoot",
"value": "C:\\ProgramData\\SF"
},
{
"name": "FabricLogRoot",
"value": "C:\\ProgramData\\SF\\Log"
}
]
}
]
}
}
It is almost identical to standalone service fabric download demo file for untrusted cluster except my VPS ip. I enabled remote registry service.I ran the
\TestConfiguration.ps1 -ClusterConfigFilePath \ClusterConfig.Unsecure.MultiMachine.json but i got the following error.
Unable to change open service manager handle because 5
Unable to query service configuration because System.InvalidOperationException: Unable to change open service manager ha
ndle because 5
at System.Fabric.FabricDeployer.FabricDeployerServiceController.GetServiceStartupType(String machineName, String serv
iceName)
Querying remote registry service on machine <IP Address> resulted in exception: Unable to change open service manager
handle because 5.
Unable to change open service manager handle because 5
Unable to query service configuration because System.InvalidOperationException: Unable to change open service manager ha
ndle because 5
at System.Fabric.FabricDeployer.FabricDeployerServiceController.GetServiceStartupType(String machineName, String serv
iceName)
Querying remote registry service on machine <Another IP Address> resulted in exception: Unable to change open service manager
handle because 5.
Best Practices Analyzer determined environment has an issue. Please see additional BPA log output in DeploymentTraces
LocalAdminPrivilege : True
IsJsonValid : True
IsCabValid :
RequiredPortsOpen : True
RemoteRegistryAvailable : False
FirewallAvailable :
RpcCheckPassed :
NoConflictingInstallations :
FabricInstallable :
DataDrivesAvailable :
Passed : False
Test Config failed with exception: System.InvalidOperationException: Best Practices Analyzer determined environment has
an issue. Please see additional BPA log output in DeploymentTraces folder.
at System.Management.Automation.MshCommandRuntime.ThrowTerminatingError(ErrorRecord errorRecord)
I don't understand the problem.VPSs are not locally connected. All are public IP.I don't know, this may b an issue. how do I make virtual LAN among these VPS?Can anyone give me some direction about this error?Anyone helps me is greatly appreciated.
Edit: I used VM term insted of VPS.
Finally I make this working. Actually all the nodes are in a network, i thought it wasn't. I enable file sharing. I try to access the shared file from the node where I ran configuration test to the all other nodes. I have to give the credentials of logins. And then it works like a charm.

Prevent Octopus from Running a Deployment Script

I am deploying a package that contains a deploy.ps1 file. As you already know Octopus is running this script on deploying by default, I want to prevent it happening and run a custom script instead.
If you have a requirement like this, then it's better to move the powershell that starts the services to a separate build step and then tag the tentacles you want that script to run on.
In your deployment step for the service, set the start mode to "Manual"
Then have a step that starts the service, and scope that script to the environments / servers that you want to auto start
The code for the step template I use here is
{
"Id": "ActionTemplates-1",
"Name": "Enable and start service",
"Description": null,
"ActionType": "Octopus.Script",
"Version": 8,
"Properties": {
"Octopus.Action.Package.NuGetFeedId": "feeds-builtin",
"Octopus.Action.Script.Syntax": "PowerShell",
"Octopus.Action.Script.ScriptSource": "Inline",
"Octopus.Action.RunOnServer": "false",
"Octopus.Action.Script.ScriptBody": "$serviceName = $OctopusParameters[\"ServiceName\"]\n\nwrite-host \"the service is: \" $serviceName\n\n& \"sc.exe\" config $serviceName start= delayed-auto\n& \"sc.exe\" start $serviceName\n\n"
},
"Parameters": [
{
"Name": "ServiceName",
"Label": "Service Name",
"HelpText": null,
"DefaultValue": null,
"DisplaySettings": {
"Octopus.ControlType": "SingleLineText"
}
}
],
"$Meta": {
"ExportedAt": "2016-10-10T10:21:21.980Z",
"OctopusVersion": "3.3.2",
"Type": "ActionTemplate"
}
}
You may want to modify the step template as it will set the service to "Automatic - Delayed" and then start the service.
Are you able to move the script to a sub folder?
These scripts must be located in the root of your package
http://docs.octopusdeploy.com/display/OD/Custom+scripts
Alternatively - don't include your deploy.ps1 script in the deployment package if it should never be deployed.

When does EMR bootstrap actions run

I am creating an AWS cluster and I have a bootstrap action to change spark-defaults.conf.
Server is keep getting terminated saying
can't read /etc/spark/conf/spark-defaults.conf: No such file or
directory
Though if I skip this and check on server the files does exist. So I assume the order of things are not correct. I am using Spark 1.6.1 by provided EMR 4.5 so it should be installed by default.
Any clues?
Thanks!
You should not change Spark configurations in a bootstrap action. Instead you should specify any changes you have to spark-defaults in a special json file you need to add when launching the cluster. If you use the cli to launch, the command should look something like this:
aws --profile MY_PROFILE emr create-cluster \
--release-label emr-4.6.0 \
--applications Name=Spark Name=Ganglia Name=Zeppelin-Sandbox \
--name "Name of my cluster" \
--configurations file:///path/to/my/emr-configuration.json \
...
--bootstrap-actions ....
--step ...
In the emr-configuration.json file you then set your changes to spark-defaults. An example could be:
[
{
"Classification": "capacity-scheduler",
"Properties": {
"yarn.scheduler.capacity.resource-calculator": "org.apache.hadoop.yarn.util.resource.DominantResourceCalculator"
}
},
{
"Classification": "spark",
"Properties": {
"maximizeResourceAllocation": "true"
}
},
{
"Classification": "spark-defaults",
"Properties": {
"spark.dynamicAllocation.enabled": "true",
"spark.executor.cores":"7"
}
}
]
The best way to achieve this goal is to use the Steps definition at a CloudFormation template for example... as Steps will run particularly at your Master node which holds the spark-default.conf file.

Resources