cloud-init: delay disk_setup and fs_setup - amazon-ec2

I have a cloud-init file that sets up all requirements for our AWS instances, and part of those requirements is formating and mounting an EBS volume. The issue is that on some instances volume attachment occurs after the instance is up, so when cloud-init executes the volume /dev/xvdf does not yet exist and it fails.
I have something like:
#cloud-config
resize_rootfs: false
disk_setup:
/dev/xvdf:
table_type: 'gpt'
layout: true
overwrite: false
fs_setup:
- label: DATA
filesystem: 'ext4'
device: '/dev/xvdf'
partition: 'auto'
mounts:
- [xvdf, /data, auto, "defaults,discard", "0", "0"]
And would like to have something like a sleep 60 or something like that before the disk configuration block.
If the whole cloud-init execution can be delayed, that would also work for me.
Also, I'm using terraform to create the infrastructure.
Thanks!

I guess cloud-init does have an option for running adhoc commands. have a look into this link.
https://cloudinit.readthedocs.io/en/latest/topics/modules.html?highlight=runcmd#runcmd
Not sure what your code looks like, but I just tried to pass the below as user_data in AWS and could see that the init script sleep for 1000 seconds... ( Just added a couple of echo statements to check later). I guess you can add a little more logic as well to verify the presence of the volume.
#cloud-config
runcmd:
- [ sh, -c, "echo before sleep:`date` >> /tmp/user_data.log" ]
- [ sh, -c, "sleep 1000" ]
- [ sh, -c, "echo after sleep:`date` >> /tmp/user_data.log" ]
<Rest of the script>

I was able to resolve the issue with two changes:
Changed the mount options, adding nofail option.
Added a line to the runcmd block, deleting the semaphore file for disk_setup.
So my new cloud-init file now looks like this:
#cloud-config
resize_rootfs: false
disk_setup:
/dev/xvdf:
table_type: 'gpt'
layout: true
overwrite: false
fs_setup:
- label: DATA
filesystem: 'ext4'
device: '/dev/xvdf'
partition: 'auto'
mounts:
- [xvdf, /data, auto, "defaults,discard", "0", "0"]
runcmd:
- [rm, -f, /var/lib/cloud/instances/*/sem/config_disk_setup]
power_state:
mode: reboot
timeout: 30
It will reboot, then it will execute the disk_setup module once more. By this time, the volume will be attached so the operation won't fail.
I guess this is kind of a hacky way to solve this, so if someone has a better answer (like how to delay the whole cloud-init execution) please share it.

Related

Pre-commit - /dev/tty block all output text defined in hook (ex: through echo) before entering user input

I'm trying to create my own hook (defined in terraform_plan.sh, can refer in terraform_plan.sh and .pre-commit-config.yaml below) and require user input to determine if this hook success or fail (This hook is about some checking by the user before commit). To activate user input function, I add exec < /dev/tty according to How do I prompt the user from within a commit-msg hook?.
The snippet code looks like this (terraform_plan.sh).
#!/bin/sh
location=$(pwd)
echo "location: ${location}"
cd ./tf_build/
project_id=$(gcloud config get-value project)
credentials=$(gcloud secrets versions access latest --secret="application-default-credentials")
echo "PROJECT_ID: ${project_id}"
echo "CREDENTIALS: ${credentials}"
terraform plan -var "PROJECT_ID=${project_id}" -var "APPLICATION_DEFAULT_CREDENTIALS=${credentials}"
exec < /dev/tty
read -p "Do yu agree this plan? (Y/N): " answer
echo "answer: ${answer}"
# for testing purpose, assign 0 directly
exit 0
I expect that the prompt Do you agree this plan? (Y/N) should appear before I can enter my answer. But actually nothing shown and it just hangs there waiting for the input.
(.venv) ➜ ✗ git commit -m "test"
sqlfluff-lint........................................(no files to check)Skipped
sqlfluff-fix.........................................(no files to check)Skipped
black................................................(no files to check)Skipped
isort................................................(no files to check)Skipped
docformatter.........................................(no files to check)Skipped
flake8...............................................(no files to check)Skipped
blackdoc.............................................(no files to check)Skipped
Terraform plan...........................................................
Only after I give an input "Y", all output strings defined in this hook (ex: output string through echo, terraform plan) comes out.
Terraform plan...........................................................Y
Passed
- hook id: terraform_plan
- duration: 222.33s
location: [remove due to privacy]
PROJECT_ID: [remove due to privacy]
CREDENTIALS: [remove due to privacy]
Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
+ create
Terraform will perform the following actions:
# google_bigquery_table.bar will be created
+ resource "google_bigquery_table" "bar" {
+ creation_time = (known after apply)
+ dataset_id = "default_dataset"
+ deletion_protection = true
...
I also try read -p "Do yu agree this plan? (Y/N): " answer < /dev/tty, but get the same issue.
Here is my part of .pre-commit-config.yaml config file.
repos:
- repo: local
hooks:
# there are other hooks above, remove them for easy to read
- id: terraform_plan
name: Terraform plan
entry: hooks/terraform_plan.sh
language: script
verbose: true
files: (\.tf|\.tfvars)$
exclude: \.terraform\/.*$
- repo: ../pre-commit-terraform
rev: d7e049d0b72eebcb09b719bb296589a47f4fa806
hooks:
- id: terraform_fmt
- id: terraform_tflint
- id: terraform_validate
args: [--args=-no-color]
So far I do not know what the root cause is and how to solve it. Seek for someone's help and suggestion. Thanks.
pre-commit intentionally does not allow interactivity so there is no working solution within the framework. you can sidestep the framework and utilize a legacy shell script (.git/hooks/pre-commit.legacy) but I would also not recommend that:
terraform plan is also kind of a poor choice for a pre-commit hook -- especially if you are blocking the user to confirm it. they are likely to be either surprised by such a hook or fatigued by it and will ignore the output (mash yes) or ignore the entire process (SKIP / --no-verify).
disclaimer: I wrote pre-commit

How to check a log for a string and verify app status?

I am a newbie to Ansible and I want to verify if an application (non service) is running, if not, start it. So basically look for the output of "splunkd is running".
Output of 'status' command is
splunkd is running (PID: 15111).
splunk helpers are running (PIDs: 15214 15420 15431 15500).
Also want to check said app's log file for a string "does not exists!" and if it exists restart app (until that string no longer exists in log - the log currently does rotate upon restart). The servers in question only exists in Production and I don't feel too comfortable executing code there for the 1st time. Below is what I have so far, I feel like the 1st block and 3rd block need editing. Thanks for any assistance and feedback!
- name: Splunk status
command: sudo /splunk/bin/splunk status
changed_when: false
- name: Read log
shell: cat /splunk/log/splunkd.log
register: splunk_log
- name: Restart if "does not exists!", exists
command: sudo /splunk/bin/splunk restart
until: splunk_log.stdout.find('does not exists!') == 0
#when: splunk_log.stdout.find('does not exists!') != -1
debug: msg="does not exists! exists, restarting"
retries: 5
delay: 60
I don't understand the first task but I think you're trying to do something like this example of changed_when. Also, for task three follow this when-statement.

Terraform unable to pass in list as a variable in bash Linux

As part of my CI/CD pipeline, I am running terraform and attempting to pass in a local variable. Unfortunately, the variable name is just taken literally.
I have tried changing the quotes around but this does not seem to do anything.
I am on Linux Ubuntu and version 0.11.14 of terraform
The bash code is
azip=1.1.1.1
Calling the plan command by:
terraform plan -var 'ip_azure=["$azip2"]'
The following plan is displayed:
Terraform will perform the following actions:
+ aws_route53_record.dns_azure
id: <computed>
allow_overwrite: <computed>
fqdn: <computed>
name: "dns_azure"
records.#: "1"
records.767631455: "$azip2"
ttl: "60"
type: "A"
zone_id: "Z2X9DFDU4LXXC6"
Plan: 1 to add, 0 to change, 0 to destroy.
I would expect
records.767631455: "1.1.1.1"
Whenever I put the ip address directly into the plan e.g.
terraform plan -var 'ip_azure=["1.1.1.1"]'
I get the expected result
Variables between single quotes don't get evaluated in Bash. If you use...
terraform plan -var "ip_azure=[${azip2}]"
... instead, it should work. Probably you don't even need the {} in this case.
Thanks to #bellackn for giving me the correct direction
I ran the following
terraform plan -var "ip_azure=[\"${azip2}\"]"
and it worked. Looks like I need to escape the double quotes

Proper syntax of write_files directive in cloud config?

I'm trying to get a cloud config script working properly with my DigitalOcean droplet, but I'm testing on local lxc containers in the interim.
One consistent problem I have is that I can never get the write_files directive working properly for more than one file. It seems to behave in weird ways that I cannot understand.
For example, this configuration is incorrect, and only outputs a single file (.tarsnaprc) in /tmp:
#cloud-config
users:
- name: julian
shell: /bin/bash
ssh_authorized_keys:
- ssh-rsa myrsakeygoeshere julian#hostname
write_files:
- path: /tmp/.tarsnaprc
permissions: "0644"
content: |
cachedir /home/julian/tarsnap-cache
keyfile /home/julian/tarsnap.key
nodump
print-stats
checkpoint-bytes 1G
owner: julian:julian
- path: /tmp/lxc
content: |
lxc.id_map = u 0 100000 65536
lxc.id_map = g 0 100000 65536
lxc.network.type = veth
lxc.network.link = lxcbr0
permissions: "0644"
However, if I swap the two items in the write_files array, it magically works, and creates both files, .tarsnaprc and lxc. What am I doing wrong, do I have a syntax error?
It may be too late, as it was posted 1 year ago. The problem is setting the owner in /tmp/.tarsnaprc as the user does not exist when the file is created.
Check cloud-init: What is the execution order of cloud-config directives? answer that clearly explains the order of cloud-config directives.
Do not write files under /tmp during boot because of a race with systemd-tmpfiles-clean that can cause temp files to get cleaned during the early boot process. Use /run/somedir instead to avoid race LP:1707222.
ref: https://cloudinit.readthedocs.io/en/latest/topics/modules.html#write-files
Came here because of using canonicals multipass. Nowadays the answers of #rvelaz and #Christian still hint to the right direction. The corrected example whould look like this:
#cloud-config
users:
- name: julian
shell: /bin/bash
ssh_authorized_keys:
- ssh-rsa myrsakeygoeshere julian#hostname
write_files:
# not writing to /tmp
- path: /data/.tarsnaprc
permissions: "0644"
content: |
cachedir /home/julian/tarsnap-cache
keyfile /home/julian/tarsnap.key
nodump
print-stats
checkpoint-bytes 1G
# at execution time, this owner does not yet exist (see runcmd)
# owner: julian:julian
- path: /data/lxc
content: |
lxc.id_map = u 0 100000 65536
lxc.id_map = g 0 100000 65536
lxc.network.type = veth
lxc.network.link = lxcbr0
permissions: "0644"
runcmd:
- "chown julian:julian /data/lxc /data/.tarsnaprc"

How do I make cloud-init startup scripts run every time my EC2 instance boots?

I have an EC2 instance running an AMI based on the Amazon Linux AMI. Like all such AMIs, it supports the cloud-init system for running startup scripts based on the User Data passed into every instance. In this particular case, my User Data input happens to be an Include file that sources several other startup scripts:
#include
http://s3.amazonaws.com/path/to/script/1
http://s3.amazonaws.com/path/to/script/2
The first time I boot my instance, the cloud-init startup script runs correctly. However, if I do a soft reboot of the instance (by running sudo shutdown -r now, for instance), the instance comes back up without running the startup script the second time around. If I go into the system logs, I can see:
Running cloud-init user-scripts
user-scripts already ran once-per-instance
[ OK ]
This is not what I want -- I can see the utility of having startup scripts that only run once per instance lifetime, but in my case these should run every time the instance starts up, like normal startup scripts.
I realize that one possible solution is to manually have my scripts insert themselves into rc.local after running the first time. This seems burdensome, however, since the cloud-init and rc.d environments are subtly different and I would now have to debug scripts on first launch and all subsequent launches separately.
Does anyone know how I can tell cloud-init to always run my scripts? This certainly sounds like something the designers of cloud-init would have considered.
In 11.10, 12.04 and later, you can achieve this by making the 'scripts-user' run 'always'.
In /etc/cloud/cloud.cfg you'll see something like:
cloud_final_modules:
- rightscale_userdata
- scripts-per-once
- scripts-per-boot
- scripts-per-instance
- scripts-user
- keys-to-console
- phone-home
- final-message
This can be modified after boot, or cloud-config data overriding this stanza can be inserted via user-data. Ie, in user-data you can provide:
#cloud-config
cloud_final_modules:
- rightscale_userdata
- scripts-per-once
- scripts-per-boot
- scripts-per-instance
- [scripts-user, always]
- keys-to-console
- phone-home
- final-message
That can also be '#included' as you've done in your description.
Unfortunately, right now, you cannot modify the 'cloud_final_modules', but only override it. I hope to add the ability to modify config sections at some point.
There is a bit more information on this in the cloud-config doc at
https://github.com/canonical/cloud-init/tree/master/doc/examples
Alternatively, you can put files in /var/lib/cloud/scripts/per-boot , and they'll be run by the 'scripts-per-boot' path.
In /etc/init.d/cloud-init-user-scripts, edit this line:
/usr/bin/cloud-init-run-module once-per-instance user-scripts execute run-parts ${SCRIPT_DIR} >/dev/null && success || failure
to
/usr/bin/cloud-init-run-module always user-scripts execute run-parts ${SCRIPT_DIR} >/dev/null && success || failure
Good luck !
cloud-init supports this now natively, see runcmd vs bootcmd command descriptions in the documentation (http://cloudinit.readthedocs.io/en/latest/topics/examples.html#run-commands-on-first-boot):
"runcmd":
#cloud-config
# run commands
# default: none
# runcmd contains a list of either lists or a string
# each item will be executed in order at rc.local like level with
# output to the console
# - runcmd only runs during the first boot
# - if the item is a list, the items will be properly executed as if
# passed to execve(3) (with the first arg as the command).
# - if the item is a string, it will be simply written to the file and
# will be interpreted by 'sh'
#
# Note, that the list has to be proper yaml, so you have to quote
# any characters yaml would eat (':' can be problematic)
runcmd:
- [ ls, -l, / ]
- [ sh, -xc, "echo $(date) ': hello world!'" ]
- [ sh, -c, echo "=========hello world'=========" ]
- ls -l /root
- [ wget, "http://slashdot.org", -O, /tmp/index.html ]
"bootcmd":
#cloud-config
# boot commands
# default: none
# this is very similar to runcmd, but commands run very early
# in the boot process, only slightly after a 'boothook' would run.
# bootcmd should really only be used for things that could not be
# done later in the boot process. bootcmd is very much like
# boothook, but possibly with more friendly.
# - bootcmd will run on every boot
# - the INSTANCE_ID variable will be set to the current instance id.
# - you can use 'cloud-init-per' command to help only run once
bootcmd:
- echo 192.168.1.130 us.archive.ubuntu.com >> /etc/hosts
- [ cloud-init-per, once, mymkfs, mkfs, /dev/vdb ]
also note the "cloud-init-per" command example in bootcmd. From it's help:
Usage: cloud-init-per frequency name cmd [ arg1 [ arg2 [ ... ] ]
run cmd with arguments provided.
This utility can make it easier to use boothooks or bootcmd
on a per "once" or "always" basis.
If frequency is:
* once: run only once (do not re-run for new instance-id)
* instance: run only the first boot for a given instance-id
* always: run every boot
One possibility, although somewhat hackish, is to delete the lock file that cloud-init uses to determine whether or not the user-script has already run. In my case (Amazon Linux AMI), this lock file is located in /var/lib/cloud/sem/ and is named user-scripts.i-7f3f1d11 (the hash part at the end changes every boot). Therefore, the following user-data script added to the end of the Include file will do the trick:
#!/bin/sh
rm /var/lib/cloud/sem/user-scripts.*
I'm not sure if this will have any adverse effects on anything else, but it has worked in my experiments.
please use the below script above your bash script.
example: here m printing hello world to my file
stop instance before adding to userdata
script
Content-Type: multipart/mixed; boundary="//"
MIME-Version: 1.0
--//
Content-Type: text/cloud-config; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment; filename="cloud-config.txt"
#cloud-config
cloud_final_modules:
- [scripts-user, always]
--//
Content-Type: text/x-shellscript; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment; filename="userdata.txt"
#!/bin/bash
/bin/echo "Hello World." >> /var/tmp/sdksdfjsdlf
--//
I struggled with this issue for almost two days, tried all of the solutions I could find and finally, combining several approaches, came up with the following:
MyResource:
Type: AWS::EC2::Instance
Metadata:
AWS::CloudFormation::Init:
configSets:
setup_process:
- "prepare"
- "run_for_instance"
prepare:
commands:
01_apt_update:
command: "apt-get update"
02_clone_project:
command: "mkdir -p /replication && rm -rf /replication/* && git clone https://github.com/awslabs/dynamodb-cross-region-library.git /replication/dynamodb-cross-region-library/"
03_build_project:
command: "mvn install -DskipTests=true"
cwd: "/replication/dynamodb-cross-region-library"
04_prepare_for_apac:
command: "mkdir -p /replication/replication-west && rm -rf /replication/replication-west/* && cp /replication/dynamodb-cross-region-library/target/dynamodb-cross-region-replication-1.2.1.jar /replication/replication-west/replication-runner.jar"
run_for_instance:
commands:
01_run:
command: !Sub "java -jar replication-runner.jar --sourceRegion us-east-1 --sourceTable ${TableName} --destinationRegion ap-southeast-1 --destinationTable ${TableName} --taskName -us-ap >/dev/null 2>&1 &"
cwd: "/replication/replication-west"
Properties:
UserData:
Fn::Base64:
!Sub |
#cloud-config
cloud_final_modules:
- [scripts-user, always]
runcmd:
- /usr/local/bin/cfn-init -v -c setup_process --stack ${AWS::StackName} --resource MyResource --region ${AWS::Region}
- /usr/local/bin/cfn-signal -e $? --stack ${AWS::StackName} --resource MyResource --region ${AWS::Region}
This is the setup for DynamoDb cross-region replication process.
If someone wants to do this on CDK, here's a python example.
For Windows, user data has a special persist tag, but for Linux, you need to use MultiPart User data to setup cloud-init first. This Linux example worked with cloud-config (see ref blog) part type instead of cloud-boothook which requires a cloud-init-per (see also bootcmd) call I couldn't test out (eg: cloud-init-pre always).
Linux example:
# Create some userdata commands
instance_userdata = ec2.UserData.for_linux()
instance_userdata.add_commands("apt update")
# ...
# Now create the first part to make cloud-init run it always
cinit_conf = ec2.UserData.for_linux();
cinit_conf .add_commands('#cloud-config');
cinit_conf .add_commands('cloud_final_modules:');
cinit_conf .add_commands('- [scripts-user, always]');
multipart_ud = ec2.MultipartUserData()
#### Setup to run every time instance starts
multipart_ud.add_part(ec2.MultipartBody.from_user_data(cinit_conf , content_type='text/cloud-config'))
#### Add the commands desired to run every time
multipart_ud.add_part(ec2.MultipartBody.from_user_data(instance_userdata));
ec2.Instance(
self, "myec2",
userdata=multipart_ud,
#other required config...
)
Windows example:
instance_userdata = ec2.UserData.for_windows()
# Bootstrap
instance_userdata.add_commands("Write-Output 'Run some commands'")
# ...
# Making all the commands persistent - ie: running on each instance start
data_script = instance_userdata.render()
data_script += "<persist>true</persist>"
ud = ec2.UserData.custom(data_script)
ec2.Instance(
self, "myWinec2",
userdata=ud,
#other required config...
)
Another approach is to use #cloud-boothook in your user data script. From the docs:
Cloud Boothook
Begins with #cloud-boothook or Content-Type: text/cloud-boothook.
This content is boothook data. It is stored in a file under /var/lib/cloud and then executed immediately.
This is the earliest "hook" available. There is no mechanism provided for running it only one time. The boothook must take care
of this itself. It is provided with the instance ID in the environment
variable INSTANCE_ID. Use this variable to provide a once-per-instance
set of boothook data.

Resources