How do I make cloud-init startup scripts run every time my EC2 instance boots? - amazon-ec2

I have an EC2 instance running an AMI based on the Amazon Linux AMI. Like all such AMIs, it supports the cloud-init system for running startup scripts based on the User Data passed into every instance. In this particular case, my User Data input happens to be an Include file that sources several other startup scripts:
#include
http://s3.amazonaws.com/path/to/script/1
http://s3.amazonaws.com/path/to/script/2
The first time I boot my instance, the cloud-init startup script runs correctly. However, if I do a soft reboot of the instance (by running sudo shutdown -r now, for instance), the instance comes back up without running the startup script the second time around. If I go into the system logs, I can see:
Running cloud-init user-scripts
user-scripts already ran once-per-instance
[ OK ]
This is not what I want -- I can see the utility of having startup scripts that only run once per instance lifetime, but in my case these should run every time the instance starts up, like normal startup scripts.
I realize that one possible solution is to manually have my scripts insert themselves into rc.local after running the first time. This seems burdensome, however, since the cloud-init and rc.d environments are subtly different and I would now have to debug scripts on first launch and all subsequent launches separately.
Does anyone know how I can tell cloud-init to always run my scripts? This certainly sounds like something the designers of cloud-init would have considered.

In 11.10, 12.04 and later, you can achieve this by making the 'scripts-user' run 'always'.
In /etc/cloud/cloud.cfg you'll see something like:
cloud_final_modules:
- rightscale_userdata
- scripts-per-once
- scripts-per-boot
- scripts-per-instance
- scripts-user
- keys-to-console
- phone-home
- final-message
This can be modified after boot, or cloud-config data overriding this stanza can be inserted via user-data. Ie, in user-data you can provide:
#cloud-config
cloud_final_modules:
- rightscale_userdata
- scripts-per-once
- scripts-per-boot
- scripts-per-instance
- [scripts-user, always]
- keys-to-console
- phone-home
- final-message
That can also be '#included' as you've done in your description.
Unfortunately, right now, you cannot modify the 'cloud_final_modules', but only override it. I hope to add the ability to modify config sections at some point.
There is a bit more information on this in the cloud-config doc at
https://github.com/canonical/cloud-init/tree/master/doc/examples
Alternatively, you can put files in /var/lib/cloud/scripts/per-boot , and they'll be run by the 'scripts-per-boot' path.

In /etc/init.d/cloud-init-user-scripts, edit this line:
/usr/bin/cloud-init-run-module once-per-instance user-scripts execute run-parts ${SCRIPT_DIR} >/dev/null && success || failure
to
/usr/bin/cloud-init-run-module always user-scripts execute run-parts ${SCRIPT_DIR} >/dev/null && success || failure
Good luck !

cloud-init supports this now natively, see runcmd vs bootcmd command descriptions in the documentation (http://cloudinit.readthedocs.io/en/latest/topics/examples.html#run-commands-on-first-boot):
"runcmd":
#cloud-config
# run commands
# default: none
# runcmd contains a list of either lists or a string
# each item will be executed in order at rc.local like level with
# output to the console
# - runcmd only runs during the first boot
# - if the item is a list, the items will be properly executed as if
# passed to execve(3) (with the first arg as the command).
# - if the item is a string, it will be simply written to the file and
# will be interpreted by 'sh'
#
# Note, that the list has to be proper yaml, so you have to quote
# any characters yaml would eat (':' can be problematic)
runcmd:
- [ ls, -l, / ]
- [ sh, -xc, "echo $(date) ': hello world!'" ]
- [ sh, -c, echo "=========hello world'=========" ]
- ls -l /root
- [ wget, "http://slashdot.org", -O, /tmp/index.html ]
"bootcmd":
#cloud-config
# boot commands
# default: none
# this is very similar to runcmd, but commands run very early
# in the boot process, only slightly after a 'boothook' would run.
# bootcmd should really only be used for things that could not be
# done later in the boot process. bootcmd is very much like
# boothook, but possibly with more friendly.
# - bootcmd will run on every boot
# - the INSTANCE_ID variable will be set to the current instance id.
# - you can use 'cloud-init-per' command to help only run once
bootcmd:
- echo 192.168.1.130 us.archive.ubuntu.com >> /etc/hosts
- [ cloud-init-per, once, mymkfs, mkfs, /dev/vdb ]
also note the "cloud-init-per" command example in bootcmd. From it's help:
Usage: cloud-init-per frequency name cmd [ arg1 [ arg2 [ ... ] ]
run cmd with arguments provided.
This utility can make it easier to use boothooks or bootcmd
on a per "once" or "always" basis.
If frequency is:
* once: run only once (do not re-run for new instance-id)
* instance: run only the first boot for a given instance-id
* always: run every boot

One possibility, although somewhat hackish, is to delete the lock file that cloud-init uses to determine whether or not the user-script has already run. In my case (Amazon Linux AMI), this lock file is located in /var/lib/cloud/sem/ and is named user-scripts.i-7f3f1d11 (the hash part at the end changes every boot). Therefore, the following user-data script added to the end of the Include file will do the trick:
#!/bin/sh
rm /var/lib/cloud/sem/user-scripts.*
I'm not sure if this will have any adverse effects on anything else, but it has worked in my experiments.

please use the below script above your bash script.
example: here m printing hello world to my file
stop instance before adding to userdata
script
Content-Type: multipart/mixed; boundary="//"
MIME-Version: 1.0
--//
Content-Type: text/cloud-config; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment; filename="cloud-config.txt"
#cloud-config
cloud_final_modules:
- [scripts-user, always]
--//
Content-Type: text/x-shellscript; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment; filename="userdata.txt"
#!/bin/bash
/bin/echo "Hello World." >> /var/tmp/sdksdfjsdlf
--//

I struggled with this issue for almost two days, tried all of the solutions I could find and finally, combining several approaches, came up with the following:
MyResource:
Type: AWS::EC2::Instance
Metadata:
AWS::CloudFormation::Init:
configSets:
setup_process:
- "prepare"
- "run_for_instance"
prepare:
commands:
01_apt_update:
command: "apt-get update"
02_clone_project:
command: "mkdir -p /replication && rm -rf /replication/* && git clone https://github.com/awslabs/dynamodb-cross-region-library.git /replication/dynamodb-cross-region-library/"
03_build_project:
command: "mvn install -DskipTests=true"
cwd: "/replication/dynamodb-cross-region-library"
04_prepare_for_apac:
command: "mkdir -p /replication/replication-west && rm -rf /replication/replication-west/* && cp /replication/dynamodb-cross-region-library/target/dynamodb-cross-region-replication-1.2.1.jar /replication/replication-west/replication-runner.jar"
run_for_instance:
commands:
01_run:
command: !Sub "java -jar replication-runner.jar --sourceRegion us-east-1 --sourceTable ${TableName} --destinationRegion ap-southeast-1 --destinationTable ${TableName} --taskName -us-ap >/dev/null 2>&1 &"
cwd: "/replication/replication-west"
Properties:
UserData:
Fn::Base64:
!Sub |
#cloud-config
cloud_final_modules:
- [scripts-user, always]
runcmd:
- /usr/local/bin/cfn-init -v -c setup_process --stack ${AWS::StackName} --resource MyResource --region ${AWS::Region}
- /usr/local/bin/cfn-signal -e $? --stack ${AWS::StackName} --resource MyResource --region ${AWS::Region}
This is the setup for DynamoDb cross-region replication process.

If someone wants to do this on CDK, here's a python example.
For Windows, user data has a special persist tag, but for Linux, you need to use MultiPart User data to setup cloud-init first. This Linux example worked with cloud-config (see ref blog) part type instead of cloud-boothook which requires a cloud-init-per (see also bootcmd) call I couldn't test out (eg: cloud-init-pre always).
Linux example:
# Create some userdata commands
instance_userdata = ec2.UserData.for_linux()
instance_userdata.add_commands("apt update")
# ...
# Now create the first part to make cloud-init run it always
cinit_conf = ec2.UserData.for_linux();
cinit_conf .add_commands('#cloud-config');
cinit_conf .add_commands('cloud_final_modules:');
cinit_conf .add_commands('- [scripts-user, always]');
multipart_ud = ec2.MultipartUserData()
#### Setup to run every time instance starts
multipart_ud.add_part(ec2.MultipartBody.from_user_data(cinit_conf , content_type='text/cloud-config'))
#### Add the commands desired to run every time
multipart_ud.add_part(ec2.MultipartBody.from_user_data(instance_userdata));
ec2.Instance(
self, "myec2",
userdata=multipart_ud,
#other required config...
)
Windows example:
instance_userdata = ec2.UserData.for_windows()
# Bootstrap
instance_userdata.add_commands("Write-Output 'Run some commands'")
# ...
# Making all the commands persistent - ie: running on each instance start
data_script = instance_userdata.render()
data_script += "<persist>true</persist>"
ud = ec2.UserData.custom(data_script)
ec2.Instance(
self, "myWinec2",
userdata=ud,
#other required config...
)

Another approach is to use #cloud-boothook in your user data script. From the docs:
Cloud Boothook
Begins with #cloud-boothook or Content-Type: text/cloud-boothook.
This content is boothook data. It is stored in a file under /var/lib/cloud and then executed immediately.
This is the earliest "hook" available. There is no mechanism provided for running it only one time. The boothook must take care
of this itself. It is provided with the instance ID in the environment
variable INSTANCE_ID. Use this variable to provide a once-per-instance
set of boothook data.

Related

How to use an anchor to prevent repetition of code sections?

Say I have a number of jobs that all do similar series of scripts, but need a few variables that change between them:
test a:
stage: test
tags:
- a
interruptible: true
rules:
- if: $CI_PIPELINE_SOURCE == 'merge_request_event'
script:
- echo "env is $(env)"
- echo etcetera
- echo and so on
- docker build -t a -f Dockerfile.a .
test b:
stage: test
tags:
- b
interruptible: true
rules:
- if: $CI_PIPELINE_SOURCE == 'merge_request_event'
script:
- echo "env is $(env)"
- echo etcetera
- echo and so on
- docker build -t b -f Dockerfile.b .
All I need is to be able to define e.g.
- docker build -t ${WHICH} -f Dockerfile.${which} .
If only I could make an anchor like:
.x: &which_ref
- echo "env is $(env)"
- echo etcetera
- echo and so on
- docker build -t $WHICH -f Dockerfile.$WHICH .
And include it there:
test a:
script:
- export WHICH=a
<<: *which_ref
This doesn't work and in a yaml validator I get errors like
Error: YAMLException: cannot merge mappings; the provided source object is unacceptable
I also tried making an anchor that contains some entries under script inside of it:
.x: &which_ref
script:
- echo "env is $(env)"
- echo etcetera
- echo and so on
- docker build -t $WHICH -f Dockerfile.$WHICH .
This means I have to include it from one step higher up. So this does not error, but all this accomplishes is cause the later declared script section to override the first one.
So I'm losing hope. It seems like I will just need to abstract the sections away into their own shell scripts and call them with arguments or whatever.
The YAML merge key << is a non-standard extension for YAML 1.1, which has been superseded by YAML 1.2 about 14 years ago. Usage is discouraged.
The merge key works on mappings, not on sequences. It cannot deep-merge. Thus what you want to do is not possible to implement with it.
Generally, YAML isn't designed to process data, it just loads it. The merge key is an outlier and didn't find its way into the standard for good reasons. You need a pre- or postprocessor to do complex processing, and Gitlab CI doesn't offer anything besides simple variable expension, so you're out of luck.

Pre-commit - /dev/tty block all output text defined in hook (ex: through echo) before entering user input

I'm trying to create my own hook (defined in terraform_plan.sh, can refer in terraform_plan.sh and .pre-commit-config.yaml below) and require user input to determine if this hook success or fail (This hook is about some checking by the user before commit). To activate user input function, I add exec < /dev/tty according to How do I prompt the user from within a commit-msg hook?.
The snippet code looks like this (terraform_plan.sh).
#!/bin/sh
location=$(pwd)
echo "location: ${location}"
cd ./tf_build/
project_id=$(gcloud config get-value project)
credentials=$(gcloud secrets versions access latest --secret="application-default-credentials")
echo "PROJECT_ID: ${project_id}"
echo "CREDENTIALS: ${credentials}"
terraform plan -var "PROJECT_ID=${project_id}" -var "APPLICATION_DEFAULT_CREDENTIALS=${credentials}"
exec < /dev/tty
read -p "Do yu agree this plan? (Y/N): " answer
echo "answer: ${answer}"
# for testing purpose, assign 0 directly
exit 0
I expect that the prompt Do you agree this plan? (Y/N) should appear before I can enter my answer. But actually nothing shown and it just hangs there waiting for the input.
(.venv) ➜ ✗ git commit -m "test"
sqlfluff-lint........................................(no files to check)Skipped
sqlfluff-fix.........................................(no files to check)Skipped
black................................................(no files to check)Skipped
isort................................................(no files to check)Skipped
docformatter.........................................(no files to check)Skipped
flake8...............................................(no files to check)Skipped
blackdoc.............................................(no files to check)Skipped
Terraform plan...........................................................
Only after I give an input "Y", all output strings defined in this hook (ex: output string through echo, terraform plan) comes out.
Terraform plan...........................................................Y
Passed
- hook id: terraform_plan
- duration: 222.33s
location: [remove due to privacy]
PROJECT_ID: [remove due to privacy]
CREDENTIALS: [remove due to privacy]
Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
+ create
Terraform will perform the following actions:
# google_bigquery_table.bar will be created
+ resource "google_bigquery_table" "bar" {
+ creation_time = (known after apply)
+ dataset_id = "default_dataset"
+ deletion_protection = true
...
I also try read -p "Do yu agree this plan? (Y/N): " answer < /dev/tty, but get the same issue.
Here is my part of .pre-commit-config.yaml config file.
repos:
- repo: local
hooks:
# there are other hooks above, remove them for easy to read
- id: terraform_plan
name: Terraform plan
entry: hooks/terraform_plan.sh
language: script
verbose: true
files: (\.tf|\.tfvars)$
exclude: \.terraform\/.*$
- repo: ../pre-commit-terraform
rev: d7e049d0b72eebcb09b719bb296589a47f4fa806
hooks:
- id: terraform_fmt
- id: terraform_tflint
- id: terraform_validate
args: [--args=-no-color]
So far I do not know what the root cause is and how to solve it. Seek for someone's help and suggestion. Thanks.
pre-commit intentionally does not allow interactivity so there is no working solution within the framework. you can sidestep the framework and utilize a legacy shell script (.git/hooks/pre-commit.legacy) but I would also not recommend that:
terraform plan is also kind of a poor choice for a pre-commit hook -- especially if you are blocking the user to confirm it. they are likely to be either surprised by such a hook or fatigued by it and will ignore the output (mash yes) or ignore the entire process (SKIP / --no-verify).
disclaimer: I wrote pre-commit

.ssh/config: line 1: Bad configuration option: \342\200\234host

I am trying to deploy code from GitLab to the EC2 instance. However, I am getting the following errors when I run the pipeline
/home/gitlab-runner/.ssh/config: line 1: Bad configuration option: \342\200\234host
/home/gitlab-runner/.ssh/config: terminating, 1 bad configuration options
Here is my .gitlab-ci.yml file that I am using.
stages:
- QAenv
- Prod
Deploy to Staging:
stage: QAenv
tags:
- QA
before_script:
# Generates to connect to the AWS unit the SSH key.
- mkdir -p ~/.ssh
- echo -e “$SSH_PRIVATE_KEY” > ~/.ssh/id_rsa
# Sets the permission to 600 to prevent a problem with AWS
# that it’s too unprotected.
- chmod 600 ~/.ssh/id_rsa
- 'echo -e “Host *\n\tStrictHostKeyChecking no\n\n” > ~/.ssh/config'
script:
- bash ./gitlab-deploy/.gitlab-deploy.staging.sh
environment:
name: QAenv
# Exposes a button that when clicked take you to the defined URL:
url: https://your.url.com
Below is my .gitlab-deploy.staging.sh file that I have set up to deploy to my server.
# !/bin/bash
# Get servers list:
set — f
# Variables from GitLab server:
# Note: They can’t have spaces!!
string=$DEPLOY_SERVER
array=(${string//,/ })
for i in "${!array[#]}"; do
echo "Deploy project on server ${array[i]}"
ssh ubuntu#${array[i]} "cd /opt/bau && git pull origin master"
done
I checked my .ssh/config file contents and below is what I can see.
ubuntu#:/home/gitlab-runner/.ssh$ cat config
“Host *ntStrictHostKeyChecking nonn”
Any ideas about what I am doing wrong and what changes I should make?
The problem is with.
ubuntu#ip-172-31-42-114:/home/gitlab-runner/.ssh$ cat config
“Host *ntStrictHostKeyChecking nonn”
Because there are some Unicode characters here, which usually comes when we copy paste code from a document or a webpage.
In your case this “ char specifically you can see in the output as well.
replace that with " and check for others in your config and update should work.
There are more details in this question getting errors stray ‘\342’ and ‘\200’ and ‘\214’

cloud-init: delay disk_setup and fs_setup

I have a cloud-init file that sets up all requirements for our AWS instances, and part of those requirements is formating and mounting an EBS volume. The issue is that on some instances volume attachment occurs after the instance is up, so when cloud-init executes the volume /dev/xvdf does not yet exist and it fails.
I have something like:
#cloud-config
resize_rootfs: false
disk_setup:
/dev/xvdf:
table_type: 'gpt'
layout: true
overwrite: false
fs_setup:
- label: DATA
filesystem: 'ext4'
device: '/dev/xvdf'
partition: 'auto'
mounts:
- [xvdf, /data, auto, "defaults,discard", "0", "0"]
And would like to have something like a sleep 60 or something like that before the disk configuration block.
If the whole cloud-init execution can be delayed, that would also work for me.
Also, I'm using terraform to create the infrastructure.
Thanks!
I guess cloud-init does have an option for running adhoc commands. have a look into this link.
https://cloudinit.readthedocs.io/en/latest/topics/modules.html?highlight=runcmd#runcmd
Not sure what your code looks like, but I just tried to pass the below as user_data in AWS and could see that the init script sleep for 1000 seconds... ( Just added a couple of echo statements to check later). I guess you can add a little more logic as well to verify the presence of the volume.
#cloud-config
runcmd:
- [ sh, -c, "echo before sleep:`date` >> /tmp/user_data.log" ]
- [ sh, -c, "sleep 1000" ]
- [ sh, -c, "echo after sleep:`date` >> /tmp/user_data.log" ]
<Rest of the script>
I was able to resolve the issue with two changes:
Changed the mount options, adding nofail option.
Added a line to the runcmd block, deleting the semaphore file for disk_setup.
So my new cloud-init file now looks like this:
#cloud-config
resize_rootfs: false
disk_setup:
/dev/xvdf:
table_type: 'gpt'
layout: true
overwrite: false
fs_setup:
- label: DATA
filesystem: 'ext4'
device: '/dev/xvdf'
partition: 'auto'
mounts:
- [xvdf, /data, auto, "defaults,discard", "0", "0"]
runcmd:
- [rm, -f, /var/lib/cloud/instances/*/sem/config_disk_setup]
power_state:
mode: reboot
timeout: 30
It will reboot, then it will execute the disk_setup module once more. By this time, the volume will be attached so the operation won't fail.
I guess this is kind of a hacky way to solve this, so if someone has a better answer (like how to delay the whole cloud-init execution) please share it.

knife vsphere requests root password - is unattended execution possible?

Is there any way to ruyn the knife vsphere for unattended execution? I have a deploy shell script which I am using to help me:
cat deploy-production-20-vm.sh
#!/bin/bash
##############################################
# These are machine dependent variables (need to change)
##############################################
HOST_NAME=$1
IP_ADDRESS="$2/24"
CHEF_BOOTSTRAP_IP_ADDRESS="$2"
RUNLIST=\"$3\"
CHEF_HOST= $HOSTNAME.my.lan
##############################################
# These are psuedo-environment independent variables (could change)
##############################################
DATASTORE="dcesxds04"
##############################################
# These are environment dependent variables (should not change per env)
##############################################
TEMPLATE="\"CentOS\""
NETWORK="\"VM Network\""
CLUSTER="ProdCluster01" #knife-vsphere calls this a resource pool
GATEWAY="10.7.20.1"
DNS="\"10.7.20.11,10.8.20.11,10.6.20.11\""
##############################################
# the magic
##############################################
VM_CLONE_CMD="knife vsphere vm clone $HOST_NAME \
--template $TEMPLATE \
--cips $IP_ADDRESS \
--vsdc MarkleyDC\
--datastore $DATASTORE \
--cvlan $NETWORK\
--resource-pool $CLUSTER \
--cgw $GATEWAY \
--cdnsips $DNS \
--start true \
--bootstrap true \
--fqdn $CHEF_BOOTSTRAP_IP_ADDRESS \
--chost $HOST_NAME\
--cdomain my.lan \
--run-list=$RUNLIST"
echo $VM_CLONE_CMD
eval $VM_CLONE_CMD
Which echos (as a single line):
knife vsphere vm clone dcbsmtest --template "CentOS" --cips 10.7.20.84/24
--vsdc MarkleyDC --datastore dcesxds04 --cvlan "VM Network"
--resource-pool ProdCluster01 --cgw 10.7.20.1
--cdnsips "10.7.20.11,10.8.20.11,10.6.20.11" --start true
--bootstrap true --fqdn 10.7.20.84 --chost dcbsmtest --cdomain my.lan
--run-list="role[my-env-prod-server]"
When it runs it outputs:
Cloning template CentOS Template to new VM dcbsmtest
Finished creating virtual machine dcbsmtest
Powered on virtual machine dcbsmtest
Waiting for sshd...done
Doing old-style registration with the validation key at /home/me/chef-repo/.chef/our-validator.pem...
Delete your validation key in order to use your user credentials instead
Connecting to 10.7.20.84
root#10.7.20.84's password:
If I step away form my desk and it prompts for PWD - then sometimes it times out and the connection is lost and chef doesn't bootstrap. Also I would like to be able to automate all of this to be elastic based on system needs - which won't work with attended execution.
The idea I am going to run with, unless provided a better solution is to have a default password in the template and pass it on the command line to knife, and have chef change the password once the build is complete, minimizing the exposure of a hard coded password in the bash script controlling knife...
Update: I wanted to add that this is working like a charm. Ideally we could have changed the centOs template we were deploying - but it wasn't possible here - so this is a fine alternative (as we changed the root password after deploy anyhow).

Resources